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Abstract  of  Dissertation  Presented  to  the  Graduate  School 
of  the  University  of  Florida  in  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of  Doctor  of  Philosophy 

A  RAY  CASTING  VISIBILITY  ALGORITHM  FOR  CUBIC  SURFACES 

By 

I§m  Giirses  Biiyiiklimanli 
May  1991 

Chairman:  Dr.  John  Staudhammer 

Major  Department:  Computer  and  Information  Sciences 

An  algorithm  for  ray  casting  parametric  surfaces  is  developed  for  visibility 
calculation.  The  new  algorithm  solves  the  ray/surface  intersection  directly  using 
Newton  iteration  and  utilizes  ray-to-ray  coherence.  The  computation  required  is 
reduced  using  bounding  boxes  as  bounding  volumes,  coherence,  and  forward 
differencing  methods.  A  parallel  implementation  of  the  algorithm  is  also  presented 
taking  advantage  of  the  parallelism  of  ray  casting  to  further  reduce  the  algorithm's 
computational  requirements.  The  final  parametric  surface  intersection  algorithm 
gives  a  good  combination  of  speed  and  accuracy.  The  algorithm  seems  to  be  well 
suited  for  hardware  implementation. 

Algorithms  as  well  as  various  architectures  are  reviewed  and  some  of  them  are 
also  compared  to  the  new  algorithm  to  prove  its  efficiency.  A  system  simulator  is 
developed  to  verify  the  algorithm  and  the  functional  behavior  of  the  system 
architecture. 

V 


CHAPTER  1 
INTRODUCTION 


Computer  graphics  produces  pictures  generated  by  a  computer  programmed 
and  operated  by  human  beings.  Only  the  imagination  of  the  user  and  the  capability 
of  the  hardware  limits  the  form  they  can  take:  animation,  simulation,  graphs, 
recreations,  portraits,  architecture,  drawings,  paintings,  or  just  simple  lines.  Because 
computer  graphics  is  a  way  of  making  thoughts  and  ideas  visible,  these  revolutionary 
images  are  becoming  the  universal  language  of  our  time  [WARD89]. 

The  design  and  manufacture  of  aircraft,  turbo  machinery,  and  automobiles, 
and  recent  applications  in  the  advertising  and  animation  industries  have  become 
driving  forces  behind  research  into  techniques  for  surface  design.  Curves  and 
surfaces  having  free-form  shapes  such  as  faces,  clothes,  shoes,  clouds,  mountains,  car 
bodies,  airplanes,  are  important  in  the  construction  of  three-dimensional  objects  for 
image  synthesis  in  computer  graphics.  The  vast  majority  of  curved  surfaces  may  be 
defined  with  a  relatively  few  numbers  by  using  piecewise  continuous  surface  patches 
[BART87].  However,  the  display  of  these  surfaces,  the  rendering,  on  a 
high-definition  display  requires  the  computation  of  coloring  on  a  pixel-by-pixel  basis. 
Clearly  this  can  be  a  computationally  intensive  task,  even  though  the  color  variations 
are  often  slight. 
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One  of  the  more  widely  used  methods  for  finely  rendering  objects  on  a  display 
screen  is  ray  casting.  As  the  name  implies,  the  procedure  calls  for  calculating  the 
light  properties  of  a  ray  of  light  which  travels  from  the  surface  elements  in  a  pixel 
to  the  observer.  Ray  casting  is  a  powerful  yet  simple  approach  to  image  generation. 
This  method  has  become  one  of  the  important  tools  from  which  highly  realistic 
images  are  generated.  But  it  also  has  disadvantages.  The  major  drawback  of  this 
procedure  is  the  great  amount  of  time  to  compute  ray/surface  intersections.  In  this 
dissertation,  this  problem  is  attacked  and  a  new  ray  casting  visibility  algorithm  is 
devised.  This  algorithm  has  been  compared  with  some  algorithms  that  are  widely 
used.  The  results  show  that  its  execution  time  is  a  bit  better  than  the  others.  It  has 
unique  properties:  it  is  short,  elegant  and  has  some  interesting  applications. 

1.1  Research  Motivation  and  Objective 
This  dissertation  was  motivated  by  the  lack  of  efficient  ray  casting  algorithms 
for  curved  surfaces.  This  work  especially  deals  with  ray  casting  of  parametric 
surfaces.  The  parametric  method  of  surface  representation  is  very  convenient  for 
approximation  and  design  of  curved  surfaces.  In  particular,  Bezier  surface  patches 
and  B-Spline  surfaces  are  extensively  used  in  computer  graphics  and  CAD.  The 
research  has  focused  on  efficient  and  fast  algorithm  design  with  the  objective  of 
determining  the  ray/object  intersections  as  fast  as  possible.  The  research  has  also 
addressed  both  the  efficiency  and  speed  of  existing  algorithms  and  analyzed 
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deficiencies  of  those  algorithms.  This  process  led  to  a  new  approach  to  resolve  those 
deficiencies. 

As  the  first  part  of  the  research,  the  Newton-Raphson  method  is  reformulated 
for  Bezier  surfaces  to  reduce  the  complexity  of  the  ray/object  intersections.  Since 
every  ray  casting  algorithm  is  by  nature  an  object-oriented  one,  this  characteristic 
made  the  design  and  implementation  attractive  by  applying  object-oriented  concepts 
and  ideas. 

For  applications  that  require  fast  transformation  such  as  ray  casting  and 
rendering  of  complex  shaded  images,  a  parallel,  pipelined  multiprocessor  architecture 
can  be  used  to  achieve  high  real-time  performance  for  image  synthesis.  A  parallel 
architecture  offers  an  increase  in  speed  that  grows  almost  linearly  with  the  number 
of  processors  used.  To  extract  the  maximum  efficiency  from  a  parallel  architecture, 
key  bottlenecks  must  be  identified  and  an  architecture  that  is  best  suited  to 
overcoming  those  bottlenecks  must  be  implemented.  Since  the  calculation  for  a  ray 
cast  through  a  pixel  in  the  screen  is  independent  of  the  calculation  for  other  rays 
cast,  this  process  can  be  implemented  in  parallel.  Therefore,  as  a  second  part  of  this 
research,  a  VLSI-oriented  real-time  algorithm  is  developed  that  combines  efficiency, 
simplicity,  speed,  and  parallelism.  Using  an  incremental  technique,  the  computation 
for  the  intersections  of  a  ray  with  surfaces  is  reduced.  The  algorithm  is  designed  in 
a  way  that  each  intersection  processor  can  be  placed  in  a  VLSI  chip.  In  a  practical 
display  system  several  of  these  chips  may  be  used  to  calculate  surface-ray 
intersections  at  pixels  or  groups  of  pixels. 
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1.2  Dissertation  Overview 
This  dissertation  is  concerned  with  the  design  of  an  efficient  algorithm  for 
display  purposes.  It  is  organized  into  six  chapters.  Chapter  1  is  an  introductory 
chapter- that  covers  objectives  and  background  about  the  dissertation  subject.  In 
chapter  2  a  review  is  presented  of  the  basic  concepts  used  devising  the  algorithm. 
Chapter  3  defines  the  problem  and  gives  a  review  of  existing  algorithms.  Chapter  4 
discusses  the  details  of  the  algorithm  and  its  complexity  and  performance  analysis. 
Chapter  5  describes  the  techniques  to  reduce  the  computation  required  and  parallel 
implementation  of  the  algorithm  and  complexity  and  performance  analysis  results  are 
given.  Finally,  chapter  6  summarizes  the  results  along  with  suggestions  for  future 
work.  In  summary,  the  major  contributions  of  this  work  are  the  design  of  an  efficient 
ray/object  intersection  algorithm,  and  improvements  on  this  algorithm  to  make  it 
suitable  for  parallel  hardware  implementation. 


CHAPTER  2 
REVIEW  OF  BASIC  CONCEPTS  USED 

2.1  Image  Synthesis 

Image  synthesis  is  the  subfield  of  computer  graphics  that  is  concerned  with  the 
production  of  pictures  defined  by  numbers  and  procedures.  If  the  image  is  to  appear 
true  to  a  physical  object,  the  process  may  be  costly,  complex  and  may  require  a 
tremendous  amount  of  computing  power.  There  is  a  tradeoff  present  in  all  image 
synthesis  applications  between  realism  and  the  amount  of  required  computations. 

The  image  synthesis  is  usually  accomplished  with  a  series  of  processes,  termed 
the  image  synthesis  pipeline.  Based  on  a  simulation  of  light  propagation  in  the  real 
world,  the  image  synthesis  pipeline  consists  of  several  steps:  creating  data 
representations,  geometry  processing  (modeling  transformation,  viewing  operation), 
rasterization  (visible-surface  determination,  scan  conversion,  shading),  and  displaying 
the  result.  Geometry  processing  is  independent  of  the  display  device,  whereas  the 
rasterization  pipeline  takes  transformed,  clipped  primitives  and  produces  pixels.  The 
display  process  that  maps  a  model  to  an  image  on  screen  is  called  rendering,  and  its 
implementation  in  software  and/or  hardware  is  referred  to  as  the  rendering  pipeline. 
This  standard  graphics  rendering,  the  image  synthesis  pipeline,  is  shown  in  Figure  2.1. 
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Figure  2.1  Standard  Rendering  Pipeline 
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The  object  database  contains  geometric,  optical  and  material  information  of  the 
objects  (also  called  primitives).  The  geometry  representation  of  each  object  depends 
on  the  type  of  primitive  used  by  the  image  renderer.  The  type  of  object  used  here 
is  a  curved  surface,  whose  characteristics  will  be  explained  later.  To  render  the  scene 
image  correctly,  the  object  is  transformed  from  modeling  coordinates  to  world 
coordinates.  Next,  world  coordinate  descriptions  are  converted  to  viewing 
coordinates.  That  is,  the  scene  to  be  viewed  is  clipped  against  a  view  volume 
(frustum)  and  projected  into  the  view  area  defined  on  the  view  plane.  There  are  two 
basic  methods  for  projecting  3D  objects  onto  a  2D  viewing  surface.  All  points  of  the 
object  can  be  projected  to  the  surface  along  parallel  lines,  or  the  points  can  be 
projected  along  lines  that  converge  to  a  position  called  the  center  of  the  projection. 
These  two  methods  are  called  parallel  projection  and  perspective  projection, 
respectively.  A  parallel  projection  preserves  relative  dimensions  of  objects  (this  is 
the  technique  used  in  drafting  to  produce  scale  drawings  of  three-dimensional 
objects)  but  does  not  give  a  realistic  representation.  A  perspective  projection,  on  the 
other  hand,  produces  realistic  views  but  doesn't  preserve  relative  dimensions.  The 
farther  a  primitive  or  part  of  a  primitive  is  from  the  viewer,  the  smaller  it  will  be 
drawn.  This  effect,  known  as  perspective  foreshortening,  is  similar  to  the  effect 
achieved  by  a  photograph  and  by  the  human  visual  system.  The  next  step  in  the 
image  generation  process  is  visible-surface  calculation,  also  called  hidden-surface 
removal.  Visible-surface  determination  is  the  process  of  determining  which  portions 
of  a  primitive  are  actually  visible  from  the  synthetic  camera's  point  of  view.  Most 
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hidden-surface  algorithms  use  image-space  methods  and  in  this  study  the  ray  casting 
technique  as  a  point-sampling  and  image-space  algorithm  is  used.  Ray  casting  is 
compatible  with  the  image  compositing  network  designed  by  Fleischman  [FLEI88]. 
After  determining  the  pixels  covered  by  a  primitive's  image,  scan  conversion,  the 
color  for  each  covered^ixel  is  determined  through  the  use  of  lighting  equations,  i.e. 
shading.  This  color  value,  along  with  its  z  coordinate  and  opacity,  is  passed  to  the 
compositing  network  for  final  display. 

2.1.1  Visibility  Determination 

The  task  of  visibility  determination  of  an  object  on  a  raster  display  has  been 
originally  referred  to  as  the  hidden-surface  problem.  Different  methods  for  visible- 
surface  calculation  are  well  established.  It  can  be  categorized  into  two  groups:  point 
sampling  algorithms  and  continuous  algorithms.  While  Z-buffer,  ray  casting/tracing, 
painter's,  and  scan-line  algorithms  belong  to  the  first  group,  area/volume  subdivision 
and  scan-plane  algorithms  belong  to  the  second.  Hidden-  surface  algorithms  are  also 
classified  according  to  whether  they  deal  with  object  definitions  directly  or  with  their 
projected  images.  These  two  approaches  are  called  object-space  methods  and  image- 
space  methods,  respectively.  An  object-space  method  implemented  in  the  physical 
coordinate  system  compares  objects  and  parts  of  objects  to  each  other  to  determine 
which  surfaces  are  visible.  These  results  are  generally  accurate  to  the  precision  of 
the  machine.  In  an  image-space  algorithm,  visibility  is  decided  point  by  point  at  each 
pixel  position  on  the  projection  plane.  That  is,  calculations  are  performed  only  to 
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the  precision  of  the  screen.  Therefore,  image-space  algorithms  are  also  called  point- 
sampling  algorithms.  The  computational  work  for  an  object-space  algorithm  that 
compares  each  object  in  a  scene  with  the  remaining  (n-1)  objects  grows  as  the 
number  of  objects  squared  (n^).  Similarly,  the  work  for  an  image-space  algorithm 
which  compares  every  object  in  the  scene  with  every  pbcel  location  in  screen 
coordinates  is  proportional  to  nN,  where  N  is  the  number  of  pixels,  typically  (512^) 
or  more.  While  one  might  therefore  believe  that  object-space  algorithms  require  less 
work  than  image-space  algorithms  even  when  n  exceeds  100,000,  in  practice,  this  is 
not  the  case.  Most  of  the  individual  steps  in  object-space  algorithms  are  more  time- 
consuming  while  it  is  easier  to  take  advantage  of  coherence  in  raster  scan 
implementation  of  image-space  algorithms. 

Visibility  determination  is  a  fundamental  and  recurring  subproblem  in 
computer  graphics.  It  is  done  in  a  3D  space  prior  to  the  projection  into  2D  that 
destroys  the  depth  information  needed  for  depth  comparisons.  The  visible-surface 
problem  for  the  simple  case  (high-order  visibility  problems  being  such  as  those 
calculating  motion  blur,  depth  of  field,  or  indirect-illumination  effects)  can  be  stated 
as  follows: 
Given 

a  set  of  surfaces  in  three  dimensions, 
a  viewpoint, 

an  oriented  image  plane, 
and  a  field  of  view, 
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For  each  pixel  on  the  image  plane  within  the  field  of  view  do  { 
Determine  which  pixel  on  which  surface  lies  closest  to 
the  viewpoint  along  a  line  from  the  viewpoint  through 
the  pixel  in  the  image  plane. 
Draw  the  pixel  in  The  appropriate  color. 

} 

The  visibility  determination  of  curved  surfaces  involves  real-time  image 
generation  process.  Ray  casting  determines  the  visibility  of  surfaces  by  tracing 
imaginary  rays  of  light  from  the  viewer's  eye  to  the  objects  in  the  scene.  This  is  a 
typical  image-space  algorithm  discussed  before  and  will  be  used  throughout  this 
dissertation  for  visibility  determination. 

2.1.2  Rav  Casting 

Although  ray  casting  has  been  around  before  computers,  it  became  popular 
in  the  computer  graphics  community  through  the  paper  by  Whitted  [WHIT80].  Ray 
casting  is  a  technique  which  accomplishes  a  mapping  from  3D  display  to  2D  image 
plane  by  shooting  a  ray  from  the  eye  through  each  pixel  on  the  screen  to  the  world 
scene  to  identify  the  visible  surface  [KAUF86,  ROTH82].  In  the  literature,  ray 
casting  and  ray  tracing  are  often  used  synonymously,  defining  a  full  recursive 
algorithm  that  integrates  both  visible  surface  determination  and  shading.  However, 
in  this  dissertation,  the  use  of  ray  tracing  or  ray  casting  refers  only  to  visible  surface 
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Figure  2.2  Rendering  Pipeline  for  Ray  Casting. 
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algorithm.  There  is  no  tracing  of  additional  rays  from  the  points  of  intersection  to 
determine  a  pixel's  shade. 

As  mentioned  before,  the  rendering  pipeline  is  a  complex  process,  but  the  ray 
casting  pipeline  is  the  simplest  among  others  because  those  objects  that  are  visible 
at  each  pixel  and  their  iirumination  are  determined  entirely  in  world  coordinates  (see 
Figure  2.2).  Once  objects  have  been  obtained  from  the  database  and  transformed 
by  the  viewing  transformation,  they  are  loaded  into  ray  caster's  world  coordinate 
database. 

Ray  casting  is  one  of  the  more  widely  used  methods  for  finely  rendering 
objects  on  a  display  screen.  The  ray  comes  from  the  light  source  and  hits  the  object. 
Some  reach  the  observer  and  let  the  observer  see  the  scene.  If  we  trace  the  rays  like 
this  (from  the  light  source  to  the  observer),  very  few  will  reach  the  observer.  Since 
this  is  obviously  not  an  efficient  way,  we  cast  the  rays  in  the  opposite  direction,  from 
the  eye  through  each  pixel  on  the  screen  to  the  world  scene.  The  first  surface  that 
an  individual  ray  intersects  will  be  the  surface  that  is  displayed  in  that  particular 
pixel.  The  ray  casting  process  is  shown  in  Figure  2.3.  This  approach  avoids  the  need 
to  explicitly  define  silhouette  edges  for  the  general  curved  patch  models  and  it  also 
allows  illumination  effects  to  be  modelled  in  a  direct  way. 

The  problem  here  is  intersecting  the  rays  with  a  large  collection  of  primitive 
objects  defining  an  environment.  The  time  spent  to  compute  ray/surface 
intersections  is  the  bottleneck  of  the  ray  casting  algorithm.  For  each  pixel  in  the 
raster,  the  corresponding  ray  is  traced  to  the  first  surface  that  it  encounters  to 
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determine  which  objects  in  the  scene  are  intersected  by  the  ray.  Even  for  a  screen 
resolution  of  512x512,  250,000  rays  should  be  traced  (generated).  The  points  of 
intersection  of  this  ray  with  the  surfaces  are  computed  and  the  intersections  are 
sorted  in  depth.  The  surface  closest  to  the  image  plane  is  the  visible  surface  through 
a  particular  pixel.  The  general  algorithm  for  the  ray  casting  is  as  follows: 

Read  in  the  parameters  for  the  scene  from  the  model  file 
Perform  preprocessing 
for  y  =  miny  to  maxy 
for  X  =  minx  to  maxx 

send  a  ray  from  origin  through  pixel  (x,y) 
for  object  =  1  to  num_of_objects 

determine  closest  object  to  origin  along  ray 
determine  lighting  of  closest  object 
color  pixel  (x,y) 

The  ray  casting  algorithm  handles  one  pixel  at  a  time  and  compares  every 
object's  depth  at  this  pixel.  As  Whitted  [WHIT80]  noted,  it  can  take  as  much  as  95 
percent  of  the  total  processing  time  just  for  the  ray/surface  intersection  calculations. 
There  are  two  popular  directions  which  can  be  considered  for  speeding  up  ray 
casting.  One  way  is  to  reduce  the  number  of  candidates  to  be  checked  against  each 
ray,  at  the  expense  of  a  little  extra  processing.  Another  way  is  to  design  a  more 
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efficient  ray/surface  intersection  procedure.  The  techniques  discussed  in  this 
dissertation  belong  to  these  two  directions. 

One  approach  is  called  space  subdivision.  The  idea  is  that  we  can  look  at  the 
world  that  surrounds  the  database  and  cut  it  up  into  many  little  volumes  called 
voxels.  As  the  ray  moves  through  the  world  looking  for  the  first  object  it  hits,  it 
essentially  moves  from  volume  to  volume.  If  each  volume  contains  a  list  of  the 
objects  within  it,  then  we  only  need  to  check  the  ray  against  those  objects.  If  we  do 
get  a  hit,  then  we  do  not  need  to  check  the  ray  against  all  the  other  objects  in  the 
database;  otherwise  we  move  to  the  next  volume  and  try  again.  There  are  two 
subdivision  schemes.  In  the  uniform  subdivision  proposed  by  Fujimoto  et  al. 
[FUJI86]  and  also  described  by  Cleary  and  Wyvil  [CLEA88],  the  object  space  is 
partitioned  into  voxels  along  three  major  axes  by  a  regular  3D  grid.  Glassner 
[GLAS84]  and  Kaplan  [KAPL85]  describe  adaptive  subdivision  in  which  a 
hierarchical  tree  known  as  octree  is  constructed  to  represent  the  given  space  by 
recursively  subdividing  it  into  eight  equal  size  and  disjoint  voxels. 

Another  approach  is  the  bounding  volume  technique.  A  simple  bounding 
volume  such  as  a  sphere  or  a  box  is  placed  around  each  object.  Using  bounding 
volumes  results  in  a  significant  gain  in  efficiency.  Only  if  a  ray  intersects  the 
bounding  volume  does  the  object  itself  need  to  be  checked  for  intersection.  Though 
this  actually  increases  the  amount  of  computation  for  rays  which  come  near  enough 
to  an  object  to  pierce  its  bounding  volume,  in  a  typical  environment  most  rays  closely 
approach  only  a  small  fraction  of  the  objects.  Used  in  this  way,  bounding  volumes 
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substitute  simple  intersection  checks  for  more  costly  ones  but  do  not  decrease  their 
number.  From  a  theoretical  standpoint  this  may  reduce  the  computation  by  a 
constant  factor  but  cannot  improve  upon  the  linear  time  complexity  of  exhaustive  ray 
tracing.  To  alleviate  this  problem,  Rubin  and  Whitted  [RUBI80]  introduced  the 
notion  of  hierarchical  bounding  volumes  to  ray  tracing  in  order  to  attain  a  theoretical 
time  complexity  v^^hich  is  logarithmic  in  the  number  of  objects  instead  of  being  linear. 
The  method  of  hierarchical  extents  uses  a  tree  search  to  find  the  objects  hit  by  a  ray. 
In  the  best  case  it  yields  O(logn)  intersection  calculations  per  ray.  Since  tree  nodes 
are  small  in  comparison  to  object  data,  only  about  30  percent  extra  space  is  required 
to  store  the  hierarchy  required  for  machine  use  in  a  typical  implementation.  The 
idea  is  further  developed  by  Kajiya  [KAJI83],  Weghorst  et  al.  [WEGH84],  and  Kay 
and  Kajiya  [KAY86].  In  the  algorithm  proposed  here,  to  eliminate  unnecessary 
intersection  tests,  the  intersection  of  a  ray  with  a  volume  bounding  the  object  is 
tested.  There  are  two  criteria  for  a  good  bounding  volume.  First,  the  volume  must 
be  easily  tested  for  an  intersection  with  a  ray.  The  bounding  volume  intersection  test 
requires  significantly  less  computation  time  than  testing  against  the  original  object. 
Second,  although  the  bounding  volume  may  be  simple  to  test,  it  should  be  as  tight 
a  fit  about  the  object  as  possible.  With  a  tighter  fit,  the  rays  that  hit  the  bounding 
volume  will  be  minimized.  The  bounding  box  scheme  used  here  is  similar  to  the 
slab-plane  scheme  developed  by  Kay  and  Kajiya  [KAY86]  and  later  adopted  by  Hsu 
[HSU90]  in  his  dissertation.  The  construction  of  bounding  boxes  will  be  explained 
in  detail  in  chapter  4. 
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2.2  Curved  Surfaces 

The  use  of  straight  Hne  segments  and  polygons  to  approximate  curved  lines 
and  surfaces  results  in  visually  objectionable  images,  even  with  the  most  sophisticated 
continuous-shading  models.  Mach  bands  appear  at  the  borders  between  adjacent 
polygons,  and  we  always- see  a  polygonal  silhouette.  Also,  polygonal  methods  often 
require  excessive  amounts  of  storage.  However,  curved  surface  techniques  allow  the 
resulting  image  to  be  computed  to  whatever  level  detail  the  situation  demands. 
Curved  surface  representations  are  important  in  computer  graphics  applications, 
because  they  are  often  more  compact  than  polygonal  representations  and  hence 
require  relatively  compact  databases. 

Surfaces  may  be  represented  in  general  by  the  3-space  relationship  F(x,y^)  =  0. 
This  is  the  implicit  representation  of  a  surface.  Unfortunately,  the  determination  of 
the  depth  (z)  of  a  surface  point  from  a  known  ray  direction  ix,y)  involves  the  solution 
of  an  implicit  relationship,  which  may  become  computationally  difficult,  even 
untractable.  On  the  other  hand,  the  use  of  parametric  curves  and  surfaces  for  object 
modeling  in  computer  graphics  is  becoming  increasingly  popular.  Parametric 
representation  is  the  most  direct  and  most  flexible  description  of  a  surface.  It  has 
many  desirable  properties  [FAUX79,  MUDU85,  ROGE76]  : 
1.  It  allows  a  smooth  object  with  no  analytical  definition  to  be  approximated  with  a 
relatively  small  number  of  surface  patches  that  join  with  continuity,  thereby  saving 
considerable  memory  space  in  representing  the  object. 
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2.  It  is  axis  independent:  it  avoids  infinite  slope  values  with  respect  to  some  arbitrary 
axis  system.  Difficulties  arise  using  axis-dependent,  nonparametric  curves  when  the 
end  point  of  a  curve  has  a  vertical  slope  relative  to  the  chosen  coordinate  system. 

3.  It  facilitates  the  representation  of  space  curves  in  homogeneous  coordinates. 

4.  Transformations  can  Be  performed  directly  on  parametric  equations.  Translation 
or  rotation  of  the  axes  or  the  object  can  usually  be  carried  out  by  translating  or 
rotating  the  vectors  defining  the  curve  or  patch,  without  modifying  the  functions  of 
the  parameters  used. 

5.  It  allows  the  unambiguous  representation  of  multivalued  surfaces  or  space 
functions. 

6.  It  doesn't  involve  solution  of  nonlinear  equations  for  each  point  (as  in  the  case  of 
implicit  representation). 

7.  Points  on  the  surface  can  easily  be  computed  sequentially  along  parametric  lines 
in  the  surface  for  display  purposes. 

8.  Parametric  surface  patches  lend  themselves  to  piecewise  constructions  more 
naturally  than  do  implicit  surfaces.  Two  reasons  for  this  are  that  parametric  patches 
are  defined  over  a  finite  domain  and  they  have  distinct  boundary  curves.  By  contrast, 
implicit  surfaces  can  be  of  infinite  extent. 

9.  Parametric  equations  usually  offer  more  degrees  of  freedom  for  controlling  the 
shape  of  curves  and  surfaces.  For  example,  a  2D  explicit  curve  of  the  form 
y  =  cc^+bx^+cx+d  has  four  coefficients  that  we  may  vary  to  control  the  curve. 
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However,  a  2D  parametric  cubic  curve  of  the  form  x  =  ai^+bu^+cu+d, 
y  =  ai+bu^+gu+h  has  eight  coefficients  that  may  be  used  for  shape  comrol. 

Curved  surfaces  can  be  described  by  parametric  bicubic  patches,  Bezier 
patches,  and  B-spline  patches  which  are  defined  by  cubic  equations  of  two 
parameters,  u  and  v  [FOLE82].  Varying  both  parameters  from  0  to  1  defines  all 
points  on  a  surface  patch.  Cubic  patches  are  determined  by  a  collection  of  16  3D 
control  points.  Because  these  patches  can  be  pieced  together  while  maintaining 
second-order  continuity  at  the  boundaries,  they  are  very  useful  for  modeling  smooth 
surfaces. 

Since  two  parameters  are  required  for  the  representation  of  a  surface,  we 
assume  that  a  surface  may  be  represented  by  a  bivariate  vector-valued  function 

Q(u,v)  =  [x(u,v)  yiu,v)  z(u,v)]  (2-1) 

The  form  of  Q{u,v)  is 


(2.2) 


The  vectors  are  the  algebraic  coefficients.  There  are  sixteen  ag  vectors,  each  with 
three  components.  This  means  that  48  degrees  of  freedom,  or  coefficients,  must  be 
specified  to  define  a  unique  surface. 
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Equation  2.2  is  more  conveniently  written  as 

(?(u,v)  =  t/  C  (^-^^ 

where  U  =  [m^  u  1],  F  =  [v'  v  1],  C  is  the  coefficient  matrix,  and  is  the 
transpose  of  V.  Since'  the  elements  of  the  coefficient  matrix  are  three  component 
vectors,  the  C  matrix  is  a  4  x  4  x  3  array. 

A  bicubic  patch  is  bounded  by  four  curves  as  shown  in  Figure  2.4;  each  is  a 
parametric  cubic  curve.  To  demonstrate  this  fact,  if  we  substitute  v=0  into  equation 
2.2,  all  terms  containing  v  vanish,  and  the  equation  becomes 

Q{u)  =  a^^u^ +a^u'^ +a^u+a^^  (2-4) 

This  is  an  expression  for  a  parametric  cubic  curve.  Of  course,  the  same  can 
be  done  for  the  boundary  curves  on  u  =  0,  w=  1,  and  v=  1.  In  fact,  by  setting  either 
M  or  V  equal  to  some  constant,  one  can  produce  rulings  on  the  surface. 

The  network  of  curves  divides  the  surface  into  an  assembly  of  topologically 
rectangular  patches,  each  of  which  has  as  its  boundaries  2  M-curves  and  2  v-curves. 
Here  it  is  assumed  that  u  and  v  run  from  0  to  1  along  the  relevant  boundaries;  then 
Q{u,v),  0<u,  v<l,  represents  the  interior  of  the  surface  patch,  while  j2(u,0),  Q{ii,\), 
j3(0,v)  and  QO.,v)  represent  the  four  boundary  curves.  The  problem  of  defining  a 
surface  patch,  then,  is  that  of  finding  a  suitably  well-behaved  function  Q(u,v)  which 
reduces  to  the  correct  boundary  curve  when  u  =  0,  m=1,  v=0  or  v=l. 
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Figure  2.4  Boundary  Curves  of  a  Bicubic  Patch 
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Figure  2.5  Boundary  Conditions  Defining  a  Bicubic  Patch 
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Since  algebraic  coefficients  are  not  the  most  convenient  way  to  define  and 
control  the  shape  of  a  patch,  a  geometric  form  defining  a  patch  in  terms  of  certain 
boundary  conditions  related  to  the  algebraic  coefficients  is  developed.  To  describe 
the  boundary  conditions  16  vectors  are  needed,  4  of  them  corner  points,  and  8  of 
them  tangent  vectors  (Figure  2.5).  Four  more  vectors  called  twist  vectors  (not  shown 
in  the  figure)  are  used  to  show  how  the  tangent  vectors  across  a  patch  boundary 
change  along  that  boundary.  For  example,  the  twist  vectors  at  poo  <^^^  Pn  control  the 
way  pI,  changes  along  the  boundary  curve  u  =  0  from  pl^  to  p^i.  The  twist  vectors  are 
denoted  as  pH,  pZ,  Pm  and  p^. 

We  will  deal  with  Bezier  surfaces  in  this  research  since  they  are  attractive  in 
interactive  design.  Advantages  of  Bezier  surfaces  are 

1.  It  is  easy  to  predict  how  the  surface  will  respond  if  a  control  point  is  moved. 

2.  Local  control  is  easy  with  Bezier  surfaces.  Any  change  in  the  vertices  of  a  span 
will  only  affect  the  curve  within  that  span.  The  rest  of  the  curve  will  remain 
unaffected. 

3.  All  that  is  necessary  to  increase  the  order  of  any  curve  segment  is  to  specify 
another  interior  vertex.  This  greatly  increases  flexibility  and  overcomes  many  of  the 
difficulties  of  cubic  spline  fitting  and  parabolic  blending  techniques. 

4.  The  convex  hull  which  is  the  convex  polygon  obtained  by  connecting  the  control 
points  of  bicubic  Bezier  surfaces  allows  bounding  regions  for  the  surface  to  be  easily 
calculated,  greatly  increasing  the  efficiency  of  many  algorithms. 
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5.  The  Bezier  patch  generated  by  the  16  control  points  is  constrained  to  He  within 
the  convex  hull  of  these  points.  This  is  useful  for  cHpping  and  determining  when  two 
patches  might  intersect. 

6.  The  surface  shape  can  be  changed  without  disturbing  the  slope  design  points, 
which  may  then  be  independently  specified  to  satisfy  continuity  requirements  with 
adjacent  surfaces. 

7.  There  is  no  need  to  calculate  slope  and  tangent  vectors. 

8.  Bezier  surface  need  not  be  represented  by  a  square  array  of  control  points. 

There  is  also  a  disadvantage  to  the  use  of  Bezier  surfaces:  the  movement  of 
one  control  point  will  affect  the  entire  surface.  In  order  to  obtain  local  control  a 
patch  must  effectively  be  subdivided. 

A  Bezier  surface  is  a  tensor  product  of  Bezier  blending  functions  (Bernstein 
basis)  which  interpolates  between  the  first  and  last  vertices.  It  is  defined  by  a  set  of 
control  vertices,  in  3D  x-y-z  space,  that  is  organized  as  a  2D  graph  with  a  rectangular 
topology.  The  set  of  control  vertices  is  called  a  control  hull  or  control  graph 
[BARS84].  A  bicubic  Bezier  surface,  shown  in  Figure  2.6  takes  the  form 

where  m=/i  =  3  and  is  the  control  graph  specifying  the  location  of  the  (m  +  1)  by 
(n  +  1)  control  points.  The  basis  function,  B^^iu),  is  given  by 
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Figure  2.6  Bicubic  Bezier  Surface 
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where 


(2.6) 


ml 


(m-i)\  i\ 


with  m  being  the  degree  of  the  polynomial  and  /  the  particular  vertex  in  the  ordered 
set  (from  0  to  m). 

For  a  bicubic  Bezier  patch,  the  x  component  of  Qs^iUyV)  in  matrix  form  is 


xiu,v)  =  [Bo3(«)  B,3(u)  B^,(,u)  B,^iu)] 


^02  ^03 


■'^lo  ^12 


^20  ^22  ^23 


^30  ^32  ^33 


5.3(v) 

523(V) 


(2.8) 


The  bicubic  Bezier  surface  takes  the  matrix  form  when  the  basis  functions  are 
substituted  in  the  above  equation: 


xiu,v)  =  [(l-uf  3(l-uYu  3(1-u)m2  u^] 


^00   ^01    ^02  ^03 


^10  ^12  ^13 


^20   ^21    ^22  ■^23 


JTjQ   Xjj  A:33 


(1-v)' 
3(l-v)2v 
3(1-v)v2 


(2.9) 
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Equation  2.9  can  be  rewritten  as 

x(u,v)  =[u^      u  I]  Af/  [v3      V  1]' 

where 


(2.10) 
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and  P,  = 


^00   ^01    ^02  ^03 


■'^lO   ■^ll    ^12  ^13 


^20   ^21    ^22  ^23 


XjQ  ATjj 


The  matrix  contains  the  position  vectors  for  points  that  define  the  Bezier 
surface  patch.  Only  the  four  corner  points  Xoo,  x^a,  Xoj,  and  x,,  actually  lie  on  the 
patch.  The  points  x^,  x^^,  x^,  and  X22  control  the  cross  slopes  along  the  boundary 
curves  in  the  same  way  as  the  twist  vectors  of  the  bicubic  patch.  The  remaining 
points  control  the  slope  of  the  boundary  curves  [MORT85]. 


CHAPTER  3 

PROBLEM  DEFINITION  AND  REVIEW  OF  EXISTING  ALGORITHMS 


3.1  Problem  Definition 

The  task  we  are  primarily  concerned  with  is  that  of  intersecting  rays  with  a 
large  collection  of  primitive  objects  defining  an  environment.  For  each  pixel  in  the 
raster,  the  corresponding  ray  is  traced  to  the  first  surface  that  it  encounters  to 
determine  which  objects  in  the  scene,  if  any,  are  intersected  by  the  ray.  The  points 
of  intersection  of  this  ray  with  the  surfaces  are  computed  and  the  intersections  are 
sorted  in  depth.  The  surface  closest  to  the  image  plane  is  the  visible  surface  at  a 
particular  pixel  [ROGE85]. 

The  ray  is  represented  parametrically,  and  the  point  along  the  ray  is  given  by 

R(t)  =  RO  +  Dt  (3.1) 

where  RO  =  [x^    zd  is  the  ray  origin,  D  =  is  the  ray  direction,  and  t  is 

the  ray's  traveling  time  relatively  starting  from  the  origin  RO. 

Perspective  projection  enhances  the  realism  of  a  displayed  image  by  providing 
the  viewer  with  depth  cues.  A  perspective  transformation  need  not  be  applied  here 
since  the  ray  casting  method  determines  the  correct  perspective  as  an  automatic 
result  of  tracing  the  rays  from  the  eye.  The  objects  are  described  in  a  left-handed 
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coordinate  system  with  the  eye  at  the  origin  (see  Figure  3.1).  The  screen  is  centered 
about  the  positive  z-axis  and  acts  as  the  viewing  plane.   Without  this  centering 
process,  objects  may  be  skewed  in  the  viewing  plane.  Skewing  is  also  dependent  on 
the  distance  in  the  z  direction  of  the  viewing  plane.   The  further  the  distance 
between  the  eye  and  the  viewing  plane,  the  more  parallel  the  different  rays  travel, 
thereby  reducing  skewing  of  the  scene.  In  fact,  some  ray  tracing  algorithms  place  the 
eye  at  z  =  -«,  thus  having  all  rays  travel  in  the  same  direction  parallel  to  the  z-axis. 
Our  implementation  places  the  eye  at  the  origin  with  the  view  plane  at  a  user 
specified  distance  d  in  the  +z  direction.  The  effect  is  to  perform  a  single-point 
perspective  projection  onto  the  image  plane.    Since  the  z  value  represents  the 
distance  of  a  point  from  the  eye,  depth  information  is  included  in  the  projection 
equations.  Objects  that  are  farther  away  have  their  x  and  y  coordinates  divided  by 
larger  z  values  than  those  closer,  causing  these  distant  objects  to  be  drawn  smaller. 
Since  we  are  using  perspective  projection  and  the  center  of  projection  is  located  at 
the  origin  (0,0,0),  the  ray  equation  3.1  becomes 

Rit)  =  Dt  (3.2) 

where  Xo  =  y„  =     =  0,  Z),  =  DJd,      =  D^/d,      =  1. 
A  bicubic  Bezier  surface  is  given  by  the  equation 

<?(m,v)  =  [x(u,v)  y(.u,v)  z(u,v)]  (3-3) 
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Figure  3.1 


Left  Handed  Coordinate  System 
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The  intersection  of  a  ray  with  a  surface  of  this  form  is  obtained  as  the  simuhaneous 
solution  of  the  three  nonlinear  equations  given  by  the  vector  equation: 

F  =  Qiu,v)  -  D  t  -  RO  (3-'*) 

The  time  spent  for  this  ray/surface  intersection  computation  is  the  bottleneck 
of  the  ray  casting  algorithm.  Even  for  a  screen  resolution  of  512x512,  250  000  rays 
should  be  generated.  For  complex  scenes  modeling  complex  lighting  effects, 
antialiasing  or  motion  blur  [COOK84],  this  calculation  must  be  performed  millions 
of  times.  Direct  methods  of  calculation  have  been  found  for  several  surface  types, 
including  spheres,  polygons  [WHIT80],  cylinders  and  volumes  of  revolution  [KAJI83], 
Steiner  patches  [SEDE84],  and  bicubic  surface  patches  [KAJI82].  Numerical 
techniques  have  been  employed  in  the  calculation  for  certain  algebraic  surfaces 
[HANR83]  and  parametric  surface  patches  [JOY86,  LEVN87,  LISC90,  SWEE86, 
TOTH85,  YANG87]. 

Several  authors  addressed  the  problem  of  reducing  the  number  of  ray/surface 
intersection  calculations  using  the  techniques  of  recursive  subdivision  [ARV087, 
FUJI86,  GLAS84,  WOOD89],  bounding  volumes  [WHIT80,  KAY86],  hierarchical 
bounding  volumes  to  further  improve  the  efficiency  of  the  bounding  volumes 
[RUBI80,  WEGH84],  and  coherence  [HECK84]. 

One  research  direction  in  image  synthesis  has  been  to  exploit  the  inherent 
parallelism  in  the  algorithms  used  to  create  realistic  images.  Ray  casting  is  an  ideal 
candidate  for  parallel  processing.  Various  hardware  techniques  [BADO90,  DIPP84, 
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PLUN85,  ULLN83]  have  been  used  for  acceleration  of  ray  tracing.  Most  of  these 
approaches  are  based  on  tasking  multiple  microprocessors  to  manipulate  a  subset  of 
rays  in  parallel.  PuUeybank  and  Kapenga  as  well  as  Ullner  also  explored  the  use  of 
VLSI  circuits  to  speed  up  the  process  of  ray/surface  intersection  [PULL87,  ULLN83]. 

Since  in  this  study  we  are  dealing  with  the  intersection  of  a  parametric  surface 
with  a  ray,  a  detailed  survey  about  ray/parametric  surface  intersection  will  be 
presented  rather  than  giving  all  the  work  for  intersecting  rays  with  surfaces. 
Although  the  methods  used  in  other  type  of  surfaces,  such  as  implicit  surfaces,  are 
similar  to  the  ones  employed  in  our  study,  those  won't  be  explained  here  in  detail. 

3.2  Literature  Review 
Equation  3.3  can  be  solved  for  points  of  intersection,  u,  v,  and  t  using 
numerical  techniques  or  subdivision  techniques.  There  is  also  a  trend  towards 
parallelizing  ray  casting  algorithms,  especially  towards  performing  ray/object 
intersection  calculations  in  parallel.  In  recent  years  several  methods  to  calculate  the 
intersection  between  parametric  surfaces  and  rays  have  been  proposed.  But, 
published  algorithms  on  ray  tracing  parametric  surfaces  are  relatively  few  compared 
with  the  number  which  deal  with  other  kinds  of  surfaces.  Unfortunately,  most  of 
these  algorithms  are  either  very  expensive,  or  possess  other  disadvantages  that 
preclude  their  use  in  real-life  rendering  systems. 
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3.2.1  Numerical  Techniques 

Kajiya  [KAJI82]  uses  ideas  from  algebraic  geometry  to  obtain  a  numerical 
procedure  for  intersecting  a  ray  with  a  bicubic  surface  patch.  A  three-dimensional 
ray  is  regarded  as  the  intersection  of  two  planes  OtX+by+c^+di  =  0  and 
a^+biy+ciz+di  -  0.  For  the  bicubic,  Kajiya  then  substitutes  the  parametric  surface 
equations  into  the  plane  equations  to  obtain  polynomial  equations  of  the  form 
Fi(u,v)  =  0,  F-^{u,v)  -  0,  which  can  be  solved  numerically.  The  Laguerre  method 
used  in  that  paper  is  cubically  convergent  and  therefore  takes  only  one  iteration  for 
linear  and  quadratic  functions.  The  method  is  slow  but  definite.  It  converges  to  the 
nearest  solution  more  quickly  than  subdivision,  but  is  more  expensive  at  each  step. 
The  algorithm  uses  floating  point  operations.  Kajiya  estimates  that  6000  floating 
point  operations  may  have  to  be  performed  in  order  to  find  all  of  the  intersections 
between  one  ray  and  one  bicubic  patch. 

Toth  solves  the  ray/surface  equation  by  minimizing  the  function  {Q(u,v)  •  Di 
•  ROy  with  respect  to  u,  v  and  t  [TOTH85].  This  method,  known  as  multivariate 
Newton  iteration,  has  the  advantage  of  being  able  to  handle  a  line  which  either 
touches  the  surface  or  has  no  intersection  with  it.  But,  the  computation  of  the 
control  points  for  the  interval  extension  method  used  in  this  paper  requires  more 
than  40%  of  the  total  CPU  time  used,  which  is  a  serious  drawback  in  the  algorithm. 
The  main  reason  for  this  is  that  the  method  fails  to  utilize  coherence. 

Joy  and  Bhetanabhotla  propose  another  algorithm  [JOY86]  using  a 
quasi-Newton  iteration  which  is  basically  a  generalization  of  the  method  used  by 
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Toth  to  solve  the  ray/surface  intersection  problem.  To  speed  up  the  convergence  of 
the  quasi-Newton  method,  ray  coherence  is  utilized:  numerical  calculations  in 
adjoining  pixels  are  used  forming  the  initial  approximations  for  new  calculations. 
Although  this  method  has  computational  advantages  over  the  conventional  Newton 
method  in  obtaining  the'  initial  guess,  naive  utilization  of  ray  coherence  may  cause 
convergence  to  incorrect  solutions  and  subdivision  of  object  space  to  prevent  this  may 
lead  to  excessive  memory  requirements. 

Sweeney  and  Bartels  [SWEE86]  present  a  method  for  rendering  B-spline 
surfaces  for  ray  tracing.  It  combines  subdivision  and  numerical  techniques:  the  B- 
spline  surface  subdivision  technique  is  utilized  to  generate  a  close  tiling  of  the 
surface  with  control  points  which  are  then  used  as  initial  approximations  for 
Newton's  method,  which  in  turn  is  used  to  generate  the  final  intersection  point. 
Time  is  saved  at  the  expense  of  using  more  memory.  There  is  also  no  guarantee  that 
the  Newton  iteration,  once  initiated,  will  converge  to  the  correct  solution. 

Levner  et  al.  [LEVN87]  describe  a  method  that  was  originally  developed  for 
ray  tracing  /3-spline  surfaces,  but  has  been  applied  successfully  to  bicubic  surfaces  in 
general.  Their  method  is  actually  a  variation  on  that  of  Sweeney  and  Bartels. 
Instead  of  refining  the  control  vertex  mesh,  they  create  a  mesh  of  points  that  lie  on 
the  surface  itself.  The  advantage  of  this  approach  is  that  point  evaluations  on  the 
surface  allow  treatment  of  general  parametric  surfaces.  But,  the  memory 
requirements  of  this  method  are  still  very  large. 
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Another  improvement  on  the  method  reported  by  Sweeney  and  Bartels  is 
described  by  Yang  [YANG87].  An  individual  octree  is  created  for  each  B-spHne 
patch  by  subdividing  its  bounding  box.  It  provides  a  more  tight  bounding  scheme 
than  that  of  Sweeney  and  Bartels.  Since  all  leaf  boxes  hit  by  a  ray  are  searched 
before  actually  attempting  ray/surface  intersection,  the  algorithm  is  inefficient. 

Lischinski  and  Gonczarowski  [LISC90]  present  improvements  of  previously 
known  techniques  and  introduce  a  space-  and  time-efficient  algorithm  utilizing  these. 
This  algorithm  combines  numerical  and  subdivision  techniques,  thus  allowing 
utilization  of  ray  coherence  and  greatly  reducing  the  average  ray/surface  intersection 
time.  Although  some  of  the  other  methods  mentioned  above,  or  in  the  next  section, 
are  faster  than  their  method,  they  claim  that  theirs  is  qualitatively  superior  to  them, 
since  it  doesn't  possess  the  others'  disadvantages. 

3.2.2  Subdivision  Techniques 

Many  subdivision  methods  have  also  been  employed  to  solve  the  intersection 
of  a  ray  with  a  bicubic  patch.  The  subdivision  algorithm  uses  much  storage  and 
converges  only  at  a  linear  rate  to  a  solution. 

Whitted  [WHIT80]  was  the  first  to  describe  recursive  subdivision  methods  for 
ray  tracing  parametric  surface  patches.  If  the  bounding  volume  of  a  patch  is  pierced 
by  the  ray,  then  the  patch  is  subdivided  and  bounding  volumes  are  produced  by  each 
subpatch.  The  subdivision  process  is  repeated  until  either  no  bounding  volumes  are 
intersected,  or  the  intersected  bounding  volume  is  smaller  than  a  predetermined 
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minimum.  This  method  requires  testing  each  surface  against  each  ray,  which  resuUs 
in  too  many  subdivisions;  thus  it  is  very  inefficient. 

Rubin  and  Whitted  [RUBI80]  have  generalized  the  method  used  by  Whitted 
representing  the  object  space  entirely  by  a  hierarchical  data  structure  consisting  of 
bounding  volumes.  TKe'  algorithm  utilizes  parallelepipeds  for  bounding  boxes  with 
an  orientation  that  minimizes  their  sizes.  These  hierarchical  volumes  were 
constructed  by  hand  for  each  scene,  a  time  consuming  process  not  suitable  for 
animation.  Although  the  hierarchical  representation  yields  an  improvement  of  about 
100:1  over  naive  subdivision,  it  is  still  too  slow. 

Recently,  a  new  approach  based  on  recursive  subdivision  was  presented  by 
Woodward  [WOOD89].  The  subdivision  process  is  carried  out  in  the  orthogonal 
viewing  coordinate  system  of  the  ray.  To  speed  up  the  computation,  the  control 
vertex  mesh  is  scaled  to  large  integer  numbers.  Subdivisions  are  done  using  integer 
addition  and  bit  shifts,  taking  care  not  to  lose  accuracy.  This  method  is  faster  than 
Whitted's  method  and  it  is  reported  to  compare  favorably  with  methods  such  as  that 
used  by  Sweeney  and  Bartels,  in  spite  of  the  fact  that  it  doesn't  attempt  to  exploit 
coherence.  But,  like  the  previous  subdivision  methods,  it  has  the  drawback  that  the 
same  amount  of  computation  must  be  performed  to  intersect  a  ray  with  every  patch: 
simple  patches  cost  as  much  as  complicated  ones. 


37 

3.2.3  Parallel  Processing  Techniques 

In  the  original  algorithm  by  Whitted  over  75%  of  the  time  was  spent 
calculating  ray/object  intersections.  Since  then  much  work  has  been  done  to  speed 
up  the  intersection  tests  by  using  parallelism  to  perform  many  ray  tests  in  parallel. 
All  the  researchers  mentioned  above,  one  way  or  another,  explore  the  possibility  of 
reducing  the  intersection  computation,  but  their  approaches  were  not  simple  enough 
for  hardware  implementation  since  they  did  not  take  into  account  that  computation 
for  intersection  can  be  done  in  parallel.  Some  parallel  machines  have  been  proposed 
in  the  literature,  but  we  have  yet  to  see  an  implementation  of  such  a  machine. 

UUner  in  his  dissertation  discusses  how  to  parallelize  ray  tracing  in  various 
ways  [ULLN83].  This  is  one  of  the  early  studies  of  parallelizing  ray  tracing  and  how 
ray  tracing  might  perform  on  different  architectures. 

In  addition  to  using  subdivision  methods,  PuUeybank  also  implemented 
ray/surface  intersection  in  a  VLSI  chip  [PULL87].  But,  recursive  subdivision  for 
curve  and  surface  rendering  is  expensive  to  implement  in  hardware  due  to  the  high 
speed  stack  memory  requirements. 

Kedem  [KEDE84a]  proposes  an  object  space  architecture  based  on  the  CSG 
and  ray  casting  concepts.  In  his  design,  Kedem  implemented  the  CSG  tree  in 
hardware.  The  reconfigurable  tree  structure  consists  of  two  kinds  of  nodes,  namely 
the  Primitive  Classifiers  (PCs)  as  leaf  nodes  and  Combine  Classifiers  (CCs)  as 
internal  nodes.  Each  PC  is  assigned  a  certain  number  of  primitives.  During  the 
image  generation  process,  the  system  casts  rays  from  the  eye  through  every  pbcel  on 
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the  simulated  display.  Each  PC  takes  these  rays  and  computes  the  intersection  of 
every  ray  with  its  associated  primitives.  This  computation  is  done  in  parallel.  The 
result  of  the  computation  is  passed  up  the  tree.  Each  CC  takes  the  results  from  its 
left  and  right  nodes,  combines  them  according  to  the  operation  assigned  to  it  and 
passes  the  result  up  the"  tree.  At  the  root  of  the  CSG  tree,  the  color  value  of  the 
pixel  is  calculated.  This  architecture  computes  quadratic  surface  visibility. 

Kedem  and  Ellis  [KEDE84b]  use  CSG  for  building  up  a  complex  object  out 
of  simple  primitives  like  cones,  cylinders,  and  spheres.  The  main  drawback  of  such 
a  system  is  that  representing  an  object  with  cubic  or  higher  order  surfaces  require 
numerous  quadratic  primitives  and  even  then  is  at  best  an  approximation  to  the 
original  surface. 

A  similar  algorithm  for  planar  patched  surfaces  was  presented  by  Thomas 
[THOM83].  It  also  used  ray  casting  techniques  and  was  specifically  designed  for 
VLSI  implementation.  Unlike  Kedem's  algorithm,  it  performs  perspective 
projections. 

The  intersection  calculation  of  a  ray  with  a  parametric  patch  is  still  the  subject 
of  ongoing  research  in  this  area.  To  produce  a  highly  parallel  machine  capable  of 
real  time  operation,  even  compromising  image  quality,  is  also  one  of  the  challenging 
areas  people  have  attacked.  The  problem  still  remains  to  find  an  efficient  execution 
of  ray  casting  algorithm  for  curved  surfaces  and  usage  of  parallel  processing 
techniques  since  ray  casting  is  inherently  a  parallel  process. 


CHAPTER  4 
THE  PROPOSED  ALGORITHM 

The  choice  of  rendering  algorithm  is  dependent  on  the  model  representation 
and  the  degree  of  realism  desired.  Ray  casting,  the  rendering  algorithm  of  our 
choice,  handles  one  pixel  at  a  time  and  compares  every  object's  depth  at  this  pixel 
as  opposed  to  the  z-buffer  algorithm  which  handles  one  object  at  a  time  and 
operates  on  all  pixels  that  this  object  covers.  The  proposed  algorithm  determines 
visibility  by  tracing  a  bundle  of  rays  from  the  observer  to  the  objects  in  the  scene. 

The  direct  intersection  of  a  ray  with  a  parametric  surface,  i.e.  without 
generating  a  polygonal  approximation  of  the  surface,  has  generally  been  done  using 
either  rather  sophisticated  numerical  methods  or  simpler  but  inefficient  subdivision 
algorithms.  This  dissertation  focuses  on  numerical  techniques  to  calculate  the 
intersection  of  a  ray  and  surface.  The  techniques,  as  described  here,  deal  with 
Bezier  surface  patches,  but  can  easily  be  extended  to  more  general  parametric 
surfaces. 

The  algorithm  consists  of  essentially  two  parts.  The  first  part  is  a  series  of 
preprocessing  steps.  The  actual  ray/surface  intersection  is  the  second  part  of  the 
algorithm.  Preprocessing  involves  selection  of  a  bounding  volume,  construction  of 
a  bounding  box  tree,  determination  of  boundary  and  silhouette  edges  in  the  patch, 
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and  calculation  of  the  surface  normal.  These  steps,  Newton's  method  as  a  numerical 
technique  which  is  used  for  the  ray/Bezier  surface  intersection  calculation,  actual 
intersection  algorithm,  and  illmnination  model  are  discussed  in  this  chapter. 

4.1  Preprocessing  Steps 
Since  the  ray  casting  procedure  is  so  computationally  expensive,  any 
preprocessing  that  can  be  performed  to  help  reduce  computations  should  be  done. 
Preprocessing  is  done  before  ray-tracing  is  begun.  This  saves  time  by  eliminating 
redundant  calculations  for  every  line/object  intersection.  The  preprocessing  steps  to 
follow  are  as  follows: 

1.  Selection  of  a  bounding  volume. 

2.  Construction  of  a  bounding  box  tree. 

3.  Determination  of  boundary  edges. 

4.  location  of  silhouette  edges. 

5.  Calculation  of  surface  normal. 

4.1.1  Selection  of  a  Bounding  Volume 

Bezier  surfaces  are  capable  of  modelling  a  wide  variety  of  shapes  which  are 
not  easy  to  model  in  other  ways.  Parametric  surfaces,  such  as  Bezier  surfaces,  are 
employed  increasingly  as  the  computer  graphics  field  matures.  Since  such 
representations  involve  more  sophisticated  mathematics  than  the  simpler  alternatives, 
it  is  more  computationally  expensive  to  produce  their  images.  For  example,  in  ray 
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casting  a  Bezier  surface,  it  might  require  thousands  of  floating  point  iterations  to  find 
intersections  of  a  ray  with  a  Bezier  surface  patch  [KAJI86,  TOTH85].  This  is  much 
more  expensive  than  finding  the  intersections  of  a  ray  with  a  polygon  or  a  quadratic 
surface. 

The  selection  of  "a  bounding  volume  helps  to  accelerate  ray  casting  Bezier 
surfaces.  It  is  possible  to  use  the  convex  hull  of  a  Bezier  patch  as  its  bounding 
volume.  Such  a  box,  however,  is  not  very  tight,  since  control  points  could  be  quite 
far  from  the  surface  they  define.  We  present  a  different  technique:  bounding  boxes 
defined  by  the  exact  minimum  (x„i„,  y^)  and  maximum  (x^^^,  y^^,)  values  attained  on 
a  surface  are  tighter  than  other  bounds  obtained  using  approximate  methods  such  as 
convex  hulls  of  the  control  vertices  (see  Figure  4.1).  Points  on  the  surface  are 
evaluated,  using  x(u,v),>'(u,v)  and  z(u,v)  given  in  chapter  2,  by  varying  two  parameters 
incrementally.  The  increments  are  obtained  dividing  the  parameters  u  and  v  by  an 
integer.  The  forward  differencing  technique  which  is  presented  in  chapter  5  is  used 
to  obtain  the  increments.  This  scheme  effectively  reduces  the  size  of  the  void  area. 
An  intersection  test  with  the  patch  is  performed  only  if  its  bounding  box  is 
intersected  by  the  ray. 

The  information  sought  for  an  intersection  test  with  a  bounding  volume  is 
simply  a  yes-no  response.  The  computation  of  the  actual  point  of  intersection  is  not 
necessary.  First  the  transformation  to  make  the  ray  coincident  with  the  z  axis  is 
computed.  The  transformation  is  then  applied  to  each  of  the  eight  vertices  of  the 
bounding  box.  A  new  bounding  box  is  then  determined  using  the  transformed 
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Figure  4.1  The  Extent  to  the  Surface  Q(u,v). 
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vertices.  An  intersection  between  the  ray  (z  axis)  and  the  box  occurs  if  the  signs  of 
the  new  x^  and  x^  are  opposite  as  well  as  the  signs  of      and  y^. 

4.1.2  Construction  of  a  Bounding  Box  Tree 

The  second  step"  in  the  preprocessing  of  a  surface  is  to  create  a  tree  of 
bounding  boxes.  The  entire  scene  is  bounded  by  a  global  bounding  box  at  the  root 
node  of  the  tree.  Every  internal  node  of  the  tree  contains  pointers  to  its  children, 
and  its  bounding  box  is  just  big  enough  to  contain  all  of  its  children's  boxes.  Each 
leaf  node  holds  representative  values  of  u  and  v  for  the  section  of  the  surface  it 
represents;  these  values  will  be  used  later  as  an  initial  guess  for  a  Newton  process. 
If  a  ray  hits  a  leaf  box,  the  intersection  of  the  ray  and  the  patch  must  be  within  the 
box  or  very  close  to  it.  The  mean  parameter  values  of  the  points  contained  in  the 
leaf  box  hit  by  a  ray  provide  good  initial  guesses  for  the  Newton  iteration  for  the 
calculation  of  the  parameter  values  of  the  intersection  point. 

Bounding  boxes  are  useful  for  complex  objects,  such  as  bicubic  surfaces,  when 
the  object  is  fairly  small.  Since  the  time  to  find  the  intersection  with  a  bicubic 
surface  is  large,  but  bounding  box  intersections  are  fast,  the  ray  tracer  can  save  time 
for  all  the  negative  tests.  In  the  case  where  the  ray  does  enter  the  box,  some 
additional  overhead  is  incurred;  however,  this  cost  is  easily  justified  by  reduced  times 
for  negative  tests. 
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When  the  initialization  (modelling)  stage  is  complete,  a  patch  is  represented 
by  its  defining  matrices  (C  =  M  P  hf)  and  the  parameters  kept  in  its  bounding  box 
requirement  is  less  than  that  required  by  the  polynomial  representation, 

4.1.3  Determination  of  Boundary  Edges 

It  is  useful  to  have  a  univariate  description  of  each  bounding  edge  of  the 
patch.  Bicubic  surface  patches  have  four  natural  boundary  curves:  C(m,0), 
Q(0,v),  and  Q{\,v),  each  of  which  is  cubic  in  one  variable.  This  is  easily  derived  from 
the  standard  bivariate  coefficient  matrbc  C  =  M  P  AT  as  follows: 

Q(u,0)  =  edge^iu) 

=  [M^u^u  l][q[0  0  0  If 

(4.1) 

Q{u,\)  =  edge^iu) 

=  [u^      u  l][q[l  1  1  If 

The  expressions  for  j3(0,v)  and  Q{\,v)  are  similar. 

Twelve  coefficients  are  required  to  describe  a  cubic  curve,  four  each  for  the 
X,  y,  and  z  components  and  these  can  easily  be  computed  from  the  C  matrix.  The 
coefficients  for  edge^iu),  for  example,  are  summations  of  the  rows  of  C.  In  addition, 
surface  normal  information  along  each  bounding  edge  of  the  patch  must  be  provided 
for  use  by  both  the  shader  and  the  silhouette  detection  routine.  The  silhouette 
detection  and  the  normal  calculation  are  explained  in  the  next  two  sections. 
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4.1.4  Location  of  Silhouette  Edges 

The  silhouette  edges  of  a  surface,  if  it  contains  any,  are  defined  as  those 
curves  on  the  surface  along  which  the  z-component  of  the  normal  vector  is  zero, 
assuming  the  eye  is  on  the  z  axis.  In  general,  since  the  order  of  the  surface  normal 
is  quintic  (as  we  will  see"  in  the  next  section)  the  order  of  the  silhouette  curve  is  also 
greater  than  the  cubic,  but  it  is  approximated  here  by  a  piecewise  cubic  interpolant 
so  that  the  silhouette  can  be  treated  the  same  as  any  other  edge.  Boundary  and 
silhoutte  edges  are  shown  in  Figure  4.2. 

To  compute  a  silhouette  edge,  the  cubic  approximation  of  the  normal  surface 
and  the  z  =0  plane  are  converted  to  the  Bezier  surface  representation  [SCHW82]. 
This  provides  a  surface  representation  of  the  normals  at  each  point  on  the  surface. 
The  normal  vector  of  the  surface  at  the  point  (u,v)  is  evaluated.  A  set  of  points 
which  lie  on  the  intersection  line  of  the  normal  surfaces  are  obtained  using 
subdivision  and  the  convex  hull  property  of  Bezier  meshes.  The  surface  equation  is 
calculated  using  (u,v)  values  to  determine  a  corresponding  set  of  points  on  the 
surface  which  represent  the  actual  silhouette  in  screen  space.  Connecting  these 
points  with  straight  lines  is  guaranteed  to  be  within  a  user-specified  tolerance  of  the 
true  silhouette. 
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Figure  4.2  Two  Types  of  Edges  for  Curved  Surfaces 
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4.1.5  Calculation  of  Surface  Normal 

The  normal  vector  to  a  Bezier  patch  can  be  found  by  taking  the  vector  cross 
product  of  the  tangent  vector  in  the  u  direction  and  the  tangent  vector  in  the  v 
direction. 

The  Bezier  surface  patch  is  defined  by 

Q(u,v)  =  UMPM^V^  ^"^-^^ 

where  U  =  [i^u^ul],V  =  [v^  v  1],  M is  the  Bezier  matrix  given  in  chapter  2,  and 
P  is  the  matrix  of  control  points.  The  u  tangent  vector  of  the  surface  Q{u,v)  is 

^("■v)  =  d(UMPSf^V^)  ^  dUj^p^TyT  ^  [3„2  2u  1  0]MPM^V^  =  U'MPM^V  (4-3) 
du  du  du 

and  the  v  tangent  vector  is 

=  djUMPM^V^)  ^  fjj^pMTdVl     UMPM^[3v'  2v  1  0]^  =  UMPM^V  (4.4) 
dv  dv  dv 

Both  tangent  vectors  are  parallel  to  the  surface  at  the  point  (m,v)  and, 
therefore,  their  cross-product  is  perpendicular  to  the  surface.  Each  of  the  tangent 
vectors  is  a  3-tuple  since  equation  4.2  represents  the  x,  y,  and  z  components  of  the 
patch.  The  normal  is  given  by 
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where  x„  are  the  x,  y,  and  z  components  of  the  u  tangent  vector  and  x„  y„  z,  are 
the  X,  y,  and  z  components  of  the  v  tangent  vector.  This  is  a  quintic  expression  in  u 
and  V.  In  our  algorithm,  we  chose  the  method  given  in  [CATM74]  to  calculate  the 
normal:  shades  are  computed  by  calculating  a  cubic  approximation  to  the  normal  at 
each  point  on  the  surface  in  order  to  simplify  the  computation.  Twelve  coefficients 
are  required  to  approximate  the  cubic  curve.  This  way,  the  rendering  system  could 
treat  positional  and  normal  information  in  a  uniform  way. 

Each  component  of  the  normal  vector  can  be  approximated  by  a  Coons  patch, 
corner  positions,  derivatives  in  each  direction  at  the  corners,  and  cross-derivatives  at 
the  corners,  XJiu,v)  s  UCP^CV,  where 


X„(0,0) 

XjiO,l) 

X„(1,0) 

dX^iO,!) 

du 

du 

dx^a,i) 

du 

du 

dX^iOfi) 

dX^{Q,\) 

dv 

dv 

JX„(1,0) 

dX(,\,\) 

dv 

dv 

dX(0,0) 

d%(S),\) 

dudv 

dudv 

d%i\fi) 

dXiU) 

dudv 

dudv 

(4.6) 


and 


C  = 
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-2 
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0 

1 

0 

1 

-2 

1 

0 

1 

0 

0 

0 

1 

-1 

0 

0 

(4.7) 
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The  y  and  z  components,  Y„  Z„  are  treated  similarly.  The  values  of  the  matrix  P„ 
Py,  and  are  evaluated  at  u=0,  iz=  1,  v=0,  and  v=  1.  These  functions  approximate 
unnormalized  normal  vector  functions,  therefore,  the  normal  vector  is  normalized 
before  use. 

The  elements  iri  the  matrix  P^  are  evaluations  of  the  functions  at  the  corners 
of  the  patch  and  they  are  given  by 

=  iU"MP  M^VWMPM^V')  +  (U'MP  M^V'XU'MP^M^V) 
du  ^ 

-  (U'MP^M^V')iU'MP^M^V)  -  (lJMPyM''V'){U"MP^M''V) 

dX(u,v) 


=  (U'MP  M^V'){UM^PM^V')  +  iU'MPM^V){UMP  M^V) 


dv         '         '  '  '  '  (4.8) 

-  (UMP^M^V'XU'MP^M^V)  -  (UMP^M^V'XU'MP^M^V) 


=  iU"MP^M^V'){UMP^M^V')  +  (U"MP^M^V){UMP^M^V") 


dudv 

+  iU'MP^M^V'XU'MP^M^V)  +  (U'MP^M^VXU'MP^M^V") 

-  (U'MPyM^V"){U'MP^M^V)  -  iU'MP^M^V'XU'MP^M^V) 

-  (UMP^M^V")iU"MP^M^V)  -  {UMPyM''V')iU"MP^M^V') 


4.2  Newton's  Method 
An  intersection  between  a  surface  Qiu,v)  and  a  ray  /?(/)  occurs  when  and  only 

when 


F(u,v,0  =  <?(«,v)  -  R(t)  =  0 


(4.9) 
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Since  bulk  of  the  computation  time  for  the  whole  intersection  calculation  is  taken  by 
the  root  finding  procedure,  it  is  crucial  to  use  a  highly  efficient  root  finder  and  the 
Newton's  method  is  the  one  used  here.  This  method  can  give  the  required  solutions 
to  any  desired  accuracy. 

Newton's  method  is  a  procedure  for  calculating  the  root  of  a  polynomial 
equation  or  a  system  of  polynomial  equations.  These  equations  can  be  solved 
iteratively  by  the  Newton  method  provided  that  a  good  initial  approximation  is 
known.  Newton's  method  converges  quadratically,  i.e.,  near  a  root  the  number  of 
significant  digits  approximately  doubles  with  each  step.  This  very  strong  convergence 
property  makes  Newton's  method  the  method  of  choice  for  any  function  whose 
derivatives  can  be  evaluated  efficiently,  and  whose  derivative  is  continuous  in  the 
neighborhood  of  a  root.  In  the  case  of  a  polynomial  equation  in  one  variable, 
Newton's  method  has  a  simple  geometric  interpretation.  If  Xo  is  our  initial  guess  for 
the  solution  to  the  equation  f(x)  =  0,  we  calculate  the  tangent  fix^).  The  next 
approximation,  Xi  is  given  by  the  point  where  the  tangent  crosses  the  axis. 

Equation  4.9  is  equivalent  to 

=  xiu,v)  -  D^-t  =  0 
Py  =  y(u,v)  -  D^'t  =  0  (4.10) 
=  z(«,v)  -  t  =  0 

where  F  is  a  vector,  in  the  form  of  F  =  [F,  F^  FJ.  The  system  F(u,v,t)  =  0  can  be 
reduced  to  a  system  of  two  equations  in  two  unknowns,  eliminating  /.  Then  we  have 
the  equivalent  system 
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g,(u,v)  =  x(u,v)  -  Z)/z(u,v)  =  0 
g^M  =  y(u,v)  -  Dy-z{u,v)  =  0 

Letting  s  =  (u,v),  equation  4.11  can  be  written  in  vector  form  as 

gis)  =  0 


(4.11) 


(4.12) 


Newton    iteration    generates    a    sequence    of   successively  improved 
approximations  to  the  solution  of  a  system  of  equations  above,  using  the  relation 


(4.13) 


where  /  is  the  Jacobian  matrix  defined  as 


J  = 


du  dv 
du  dv 


(4.14) 


calculated  at  u  =  ,  v  =  v,  and  J'\x,)  g  (x,)  is  the  Newton  step.  We  can  rewrite  the 
equation  4.13  as 


-7' 

^l("..'^r)' 

(4.15) 


The  system  of  equations  above  is  solved  iteratively  provided  that  the  Jacobian  matrix, 
/,  is  non-singular  at  each  approximation  point  and  there  is  a  good  initial  guess  to 
start  with.  There  is  even  no  need  to  calculate  the  inverse  matrix  since  the  inverse 
of  2x2  Jacobian  matrix  is  trivial.  Since     can  be  expressed  explicitly  as 
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1  •'c 


01 


(4.16) 


det(7)   -J,.  J{ 


00 


equation  4.15  can  be  rewritten  as 


u, 


(4.17) 


V, 


where  D{u,v)  =  1  /  detiJ). 

Once  the  Newton  steps  have  been  calculated  with  an  initial  guess  (Uo,Vo), 
equation  4.17  is  used  to  evaluate  {u^,v^)  from  (Mo,Vo),  then  {u^,v^)  from  (Ui,Vi),  and  so 
on,  until  one  of  the  following  conditions  are  satisfied: 
1.  |gi(",.i,v,„)  I  +  |g2(«..i,v,^i)  I  <  tolerance 
l.uorv  are  outside  the  parametric  intervals 

3.  |^i(«,.i,v,,i)|  +  |g2(u..„v,.,)l  >  ki(w^v,)|  +  \g^{u„v;)\ 

4.  n+l  >  maximum  number  of  steps 

If  the  iteration  is  terminated  by  a  case  1,  it  is  considered  that  the  ray  does  hit  the 
patch,  and  u,+i,v,+i  are  returned  as  the  parameters  for  calculating  the  coordinates  and 
the  derivatives  at  the  intersection  point.  If  the  iteration  is  terminated  by  cases  2  or 
3  or  4,  it  is  considered  that  no  intersection  happens,  and  NIL  is  returned. 

In  this  implementation,  tolerance  and  maximum  number  of  steps  are  chosen 
as  0.01  and  5,  respectively,  although  these  values  can  be  varied  according  to  the 
surfaces  used.  It  was  found  all  Newton  calls  were  terminated  by  cases  1,  2,  or  3,  and 
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that  about  90%  calls  required  only  one  iteration.  The  search  algorithm  is  performed 
in  the  surface  parameter  space  only.  The  ray  parameter  is  ignored,  even  while 
performing  Newton  iteration,  until  a  solution  is  found.  The  Newton  iteration  finds 
a  pair  (u,v)  such  that  a  point  Qiu,v)  on  the  surface  is  also  a  point  in  a  given  ray.  The 
Newton  method,  in  high-level  pseudocode,  is  shown  in  Figure  4.3. 

4.3  Algorithm 

Primitive  objects  such  as  bicubic  Bezier  surfaces  are  rendered  one  ray  at  a 
time.  The  algorithm  developed  utilizing  this  ray  casting  technique  for  bicubic  Bezier 
surface  is  as  follows. 

A  list  of  tree  nodes  to  be  tested  for  intersection  with  the  ray  is  sorted  by  the 
minimum  distance  from  the  origin  of  the  ray  to  the  bounding  box  of  the  node  so  that 
the  next  node  in  the  list  is  always  closest  to  the  origin  of  the  ray.  The  list,  at  the 
beginning,  contains  only  the  root  node  of  the  tree.  At  each  iteration,  the  closest 
node  on  the  active  list  is  chosen  and  removed.  If  the  node  is  an  internal  node,  each 
of  its  children  is  tested  in  turn  for  possible  intersection  with  the  ray.  If  the  ray  hits 
the  bounding  box  of  a  child,  the  child  is  put  back  into  the  list.  The  nodes  not 
intersected  by  the  ray  are  thrown  away,  thereby  saving  memory.  If  the  node  is  a  leaf 
node,  then  we  use  the  contained  (u,v)  parameter  values  to  initiate  the  Newton's 
method.  The  loop  exits  when  either  the  list  of  nodes  to  be  tested  is  empty,  or  an 
intersection  is  found  whose  distance  from  the  origin  of  the  ray  is  less  than  the 
minimum  distance  from  the  origin  to  the  next  node  in  the  list. 
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Newton  (surface,  treenode,  ray,  intersection,  t,  u,  v) 
{ 

calculate  C  matrix         /*  C„  Cy,  C,  */ 
get  initial  values  for  u  and  v  from  treenode 
success  =  failure  =  FALSE 

while  (not  success  &&  not  failure)  { 
calculate  jacobian,  J 
calculate  det(J) 
if  (det(J)  =  0)  { 

approximate  J 

calculate  det(J)  again 

} 

calculate  J'^ 

calculate  gi(u,v),  g2(u,v) 
error  =  |gi(u,v)|  +  |g2(u,v)| 
if  (error  <  tolerance) 

success  =  TRUE 
else  if  (iterations  >  =  maxit)  &&  (error  >  =  last  error) 
&&((u<0)  II  (u>l)  II  (v<0)  II  (v>l)) 

failure  =  TRUE 
else  { 

calculate  Newton  steps 
calculate  u^+i,  v^^j 
increment  iteration  count 

} 

} 

if  (success)  { 
intersection  =  point  on  surface  at  (u,v) 
t  =  distance  from  ray  origin  to  intersection  point 

} 

} 

Figure  4.3  Algorithm  for  Newton  Method 
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The  intersection  procedure  which  is  the  heart  of  the  whole  algorithm  is  shown 
in  Figure  4.4.  The  bicubic  surface  primitive  engine  (BiSPE)  is  based  on  this 
intersection  procedure.  This  algorithm  which  executes  the  task  of  the  BiSPE  is 
suitable  for  parallel  implementation  and  will  be  explored  in  detail  in  chapter  5.  The 
algorithm  follows  Thomas's  [THOM83]  approach  for  collecting  the  intersection  depth 
results  from  the  primitive  processors  and  for  determining  which  surface  is  visible 
along  the  current  ray.  The  intersection  with  the  maximum  z  coordinate  is  the  visible 
surface  on  the  image  plane. 

4.4  Lighting  Model 

Once  a  rendering  algorithm  has  resolved  the  geometric  question  of  what  spot 
on  what  object  a  pixel  represents,  it  needs  to  resolve  what  color  that  spot  appears  to 
the  observer.  The  visible  surface  for  a  pixel  is  a  small  patch  of  surface.  The  shading 
model  takes  into  account  the  composition,  direction,  and  geometry  of  the  light 
sources,  the  surface  orientation  and  the  surface  properties  of  the  object.  Surface 
normals  can  be  used  to  indicate  the  shape  and  orientation  of  a  surface  patch,  which 
in  turn  is  used  to  determine  the  shading  value. 

Ray  casting  is  the  process  of  determining  which  object  in  a  scene  is  first 
encountered  by  a  ray  emanating  from  the  camera  view  location  and  passing  through 
a  pixel  on  the  display  screen.  From  the  list  of  ray-object  intersections  the  first  visible 
intersection  is  chosen.  Since  there  is  no  further  tracing  the  rays  from  the  intersection 
point,  the  conventional  shading  model  due  to  Phong  [FOLE82,  ROGE85]  is  used 
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intersect_surface  (surface,  ray,  intersection) 
{ 

initialize  empty  list 

add  root  node  of  surface's  tree  to  list 

while  (list  not  empty)  { 
remove  next  node  from  list 
if  (node  =  leaf  node)  { 
call  Newton  process 

if  (intersection  found)  &&  (closer  than  previous  ones) 
save  it  as  int_found 

} 

else  for  each  child  of  node 
if  (ray  intersects  bounding  box) 
add  child  node  to  list 
if  (int_found  <  min_dist_to_nextnode)  exit  loop 

} 

} 


Figure  4.4  Ray/Bezier  Surface  Intersection  Algorithm 
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with  ambient  or  background  lighting,  diffuse  shading  due  to  surface  orientation  in 
relationship  to  the  light  source,  and  specular  reflection  or  highlights  due  to  the 
location  of  the  light  source  and  the  eye  as  well  as  the  surface  orientation. 

Phong's  equation  for  the  intensity  of  each  pixel  consists  of  the  sum  of  three 
terms  given  by 


^      ^ambient  *  ^diffuse  ^  ^specular 


=  kl  +    +   

I, 

=  k  I  +         (fc.cose  +  kcos"^) 


(4.18) 


where  kj,  and  k,  represent  the  ambient,  diffuse  and  specular  coefficients 
respectively  for  the  object  and  0  <  k„ka,k,  <  1,  and  /,  are  the  intensity  of  the 
ambient  light  which  is  a  constant  throughout  the  given  scene  and  the  intensity  of  the 
light  source  respectively,  d  represents  the  distance  from  the  perspective  viewpoint  to 
the  object,  K  is  an  arbitrary  constant  that  is  included  to  prevent  the  denominator 
from  approaching  zero  when  d  is  small,  N  is  the  unit  normal  to  the  object  at  the 
point,  L  is  the  unit  vector  having  the  direction  to  the  light  source,  Fis  the  unit  vector 
from  the  eye  (origin)  to  the  object,  S  is  the  unit  vector  of  the  specularly  reflected  ray, 
6  is  the  angle  between  N^and  L,  and  <t>  is  the  angle  between  Vand  S  (see  Figure  4.5). 

Diffuse  and  specular  components  of  the  lighting  model  are  based  on  surface 
orientation.  An  object  face  perpendicular  to  the  direction  of  the  light  source  is 
illuminated  differently  from  one  whose  face  is  angled  to  the  direction  of  the  light 
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source.  The  normal  to  the  surface  at  the  point  of  intersection  describes  the  surface 
orientation.  Each  point  in  the  scene  visible  to  the  viewer  must  have  the  normal  to 
the  surface  at  that  point  computed  prior  to  the  lighting  model  implementation. 

For  evaluating  the  intensity  of  the  light  at  each  pixel,  three  color  components 
of  the  intensity  should  be  separately  calculated.  For  an  RGB  video  monitor,  color 
components  are  red,  green,  and  blue.  Parameters  for  intensity  and  reflectivity  then 
become  three-element  tuples,  with  one  element  for  each  of  the  color  components. 
For  instance,  the  triple  representing  the  coefficient  of  diffuse  reflection  is  (k^^ 
k^.  However,  since  the  constant  value  k,  for  specular  reflection  is  not  dependent 
on  the  color  of  the  surface,  k^  is  the  same  for  all  three  color  components.  Hence,  the 
intensity  calculation  breaks  into  three  equations: 

(4.19) 

h  =  kj^  +  -^*{k^{N.L)  +  k^iVm 
a+K 

4.5  Results  and  Discussion 
The  ray  tracing  technique  is  an  object  oriented  algorithm.  Rather  than  group 
the  software  by  procedure,  we  group  it  by  data  structure.  Thus,  instead  of  collecting 
all  the  intersection  methods  into  one  file  and  all  the  normal  vector  formulas  into 
another,  we  split  the  problem  the  other  way,  collecting  procedures  for  each  primitive 
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Figure  4.5  Diffuse  and  Specular  Reflections 


60 

type  into  a  file  of  its  own.  If  there  was  more  than  one  primitive  such  as  spheres, 
polygons,  there  would  be  one  file  containing  patch-related  routines,  one  file  for 
sphere-related  routines,  another  for  polygon-related  routines,  etc.  This  has  the 
advantage  that  primitive-dependent  information  can  be  hidden  in  data  structures 
local  to  each  file  and'tlie  procedure  interfaces  can  be  very  simple  and  generic. 
Adding  new  primitives  to  the  system  becomes  easy  with  this  scheme.  Since  the 
details  of  each  primitive's  data  structure  will  be  local  to  that  sub-module,  all 
operations  on  the  primitive  are  supported  by  generic  procedures.  Each  object  shape 
is  defined  using  its  individual  parameters.  The  intersection  routine  is  object  oriented 
also.  Each  object  shape  uses  different  procedural  techniques  to  determine  an 
intersection. 

4.5.1  Complexity  and  Performance  Analysis 

In  this  section,  the  precision  required  in  the  critical  parts  of  the  algorithm  is 
examined.  The  algorithm  is  bound  almost  exclusively  by  floating  point  speed,  not  by 
storage  management,  not  by  dataflow  complexity,  and  not  by  program  control 
complexity. 

We  intersect  each  ray  with  the  environment  by  simply  testing  each  and  every 
primitive  object  retaining  the  nearest  point  of  intersection,  if  one  exists.  This  has  a 
time  complexity  which  is  linear  in  the  number  of  objects.  All  the  intersections  are 
sorted  in  order  of  increasing  parameter  t,  and  the  first  such  intersection  is  returned. 
To  find  the  intersections,  we  need  to  solve  two  simultaneous  equations  depicted  in 
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equation  4.11.  The  number  of  floating  point  operations  (flops)  necessary  at  each  step 
solving  these  equations  are  summarized  as  follows. 

1.  Calculation  of  gi(w,v)  and  gi{u,v). 

a.  Computation  of  the  coefficient  matrix,  C  =  M  P  \f.  Three  4x4  matrix 
multipHcation  requires  40  subtractions,  16  additions,  and  24  multiplications 
(knowing  M  contains  I's  and  O's  in  certain  positions,  the  calculations  are 
simplified).  This  is  just  for  one  component,  therefore  for  three  of  them  it  adds 
up  to  120  subtractions,  48  additions,  and  72  multiplications. 

b.  Computation  of  x(ii,v),  y(u,v),  z(m,v).  It  takes  4  multiplications  to  calculate  „^ 
li,  v^,  and  v'.  Multiplication  of  1x4  U  matrix  with  4x4  C  matrix  and 
multiplication  of  resultant  with  4x1  V  matrix  requires  15  multiplications  and 
15  additions  for  one  component  bringing  the  total  to  49  multiplications  and  45 
additions  for  all  three  components. 

c.  The  solution  of  equation  4.11,  once  the  necessary  terms  are  calculated, 
requires  2  multiplies  and  2  subtractions. 

2.  To  solve  the  equations  above,  we  need  to  calculate  Newton  iteration  procedure. 

a.  Calculation  of  partial  derivatives  for  x,  y,  and  z  components  requires  49 
multiplications  and  39  additions. 

b.  Calculafion  of  partial  derivatives  for  gi  and  needs  6  multiplies  and  4 
subtractions. 

c.  Calculation  of  det(J)  requires  2  multiplications  and  1  subtraction. 

d.  Calculation  of  D  requires  1  division. 
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e.  To  solve  equation  4.17,  6  multiplications  and  4  subtractions  are  needed. 
3.  Computing  t  takes  1  addition. 

It  is  not  hard  to  see  that,  taken  at  its  simplest  level,  ray  casting  is  highly 
computationally  expensive.  These  results  are  for  ray/patch  intersection  for  one 
iteration.  Since  it  takes  only  one  iteration  90%  of  the  time  for  Newton  iteration  to 
converge,  the  total  number  of  single  precision  floating  point  operations  are  tabulated 
and  are  shown  in  Table  4.1. 

For  a  resolution  of  1024x1024,  there  are  one  million  rays  need  to  be  traced 
per  frame  at  30  frames  per  second  to  achieve  real-time  performance.  A  real-time 
processor  has  to  dispatch  the  computation  for  each  ray  in  33ms.  Thus,  if  only  10 
floating  operations  are  required  per  ray,  real-time  ray  casting  requires  a  300 
megaFlop  processor.  As  we  have  seen,  we  have  at  worst  case  approximately  448 
floating  point  operations  per  ray/patch  intersection.  Calculation  yields  a  complexity 
of  3  Hr/frame  on  a  quarter  megaFlop  machine  (VAX  11/780).  The  state  of  the  art 
for  general  purpose  machines  is  roughly  200-1000  megaFlops.  It  is  clear  that  in  order 
to  realize  real-time  ray-traced  images  we  need  substantial  advances  both  in  algorithm 
development  as  well  as  in  special  purpose  hardware  design. 

In  this  chapter,  the  analysis  of  the  algorithm  is  given  without  taking  advantage 
of  any  coherence.  It  seems  that  incorporating  very  strong  ray-to-ray  coherence  is  a 
must  for  a  practical  ray  tracing  system.  Incoherence  can  be  an  advantage  when 
seeking  to  parallelize  the  algorithm.    The  opportunity  for  virtually  unbounded 
parallelism  is  afforded  by  the  fact  that  the  each  pixel  computation  proceeds 
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Table  4.1  Number  of  Floating  Point  Operations  for  Ray/Patch  Intersection 

Term  Multiplication     Addition       Subtraction  Division 

gi,  &  123  93 

Ur+i,  v,,i  61  39 
t  1 

Total  184  133 


122 

8  1 
130  1 


independently  of  all  others.  Because  it  uses  well  known  numerical  procedures,  fast 
floating  point  pipelines  will  without  any  difficulty  directly  accelerate  the  speed  of 
image  generation. 

There  are  several  optimizations  which  are  carried  out  on  the  ray/surface 
intersection  calculations.  First,  rays  are  transformed  to  lie  along  the  z-axis,  then  the 
same  transformation  are  applied  to  each  candidate  object,  so  that  any  intersection 
occurs  at  X  =  y  =  0.  This  simplifies  the  intersection  calculation  and  allows  the 
closest  object  to  be  determined  by  a  2  sort.  The  intersection  point  can  then  be 
transformed  back  for  use  in  shading  calculations  via  the  inverse  transformation. 
Second,  by  not  computing  the  normal  in  the  intersection  routine,  we  make  each 
intersection  test  cheaper.  Finally,  using  integer  arithmetic  helps  speed  some 
computations. 
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4.5.2  Simulation  Results 

A  ray  casting  image  rendering  system  has  been  designed  to  make  the  code 
easily  expandable  and  maintainable.  The  validity  of  the  algorithm  was  verified  by 
software  simulation.  The  algorithm  is  implemented  in  the  C  language  on  an  IBM 
AT/386  computer  with"  a  80387  floating  point  processor.  An  object  oriented  design 
philosophy  was  used  for  the  program,  and  the  resulting  code  has  been  made  robust. 

A  bicubic  Bezier  surface  is  presented  to  the  program  as  three  matrices  of  x, 
y,  and  z  separately.  The  memory  requirements  for  the  bicubic  Bezier  surface  are 
determined  largely  by  the  size  of  the  tree  of  bounding  boxes.  Performance 
characteristics  for  various  surfaces  are  given  in  Table  4.2.  Although,  CPU  times  are 
often  misleading,  being  dependent  on  programming  style,  code  optimization,  and 
hardware,  we  still  supply  these  in  order  to  give  an  indication  of  the  code  complexity. 

One  of  the  major  advantages  of  ray  tracing  algorithms  is  that  they  provide  a 
uniform  geometric  framework,  the  line/surface  intersection,  for  displaying  variety  of 
different  surface  types.  Unfortunately,  this  increased  generality  increases  the 
computation  time  significantly.  Therefore,  we  consider  implementing  some  part  of 
the  algorithm  in  hardware.  One  approach  is  to  exploit  the  independence  of  the 
calculations  from  pixel  to  pixel  and  perform  these  calculations  in  parallel.  The  pixel 
level  approach  of  the  proposed  architecture  reduces  the  geometry  that  is  typically 
involved  for  solving  intersection  problems  to  the  comparison  of  depth  values. 
Hardware  implementation  of  the  primitive  engine  algorithm  finding  the  roots  of 
polynomial  very  quickly  and  in  a  robust  manner  is  discussed  in  the  next  chapter. 
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Table  42  Performance  Characteristics  for  Various  Surfaces 


Surface 

1 eapoi 

OUilCi  c 

Onii  phnut^ 

Number  of  natches 

32 

8 

16 

Number  of  control  vertices/patch 

16 

16 

16 

Total  number  of  ravs 

64000 

40000 

64000 

Average  boxes  checked/ray 

44.6 

24.32 

31.12 

Average  Newton  caHs/ray 

0.78 

0.89 

0.87 

Average  iterations/call 

1.08 

0.92 

1.03 

Modelling  time 

2% 

1% 

1% 

Bounding  box  intersect  time 

46% 

45% 

48% 

Newton  time 

23% 

28% 

25% 

Shading  time 

15% 

14% 

11% 

Miscellaneous  time 

14% 

12% 

15% 

Total  time  (hrs:mins:secs) 

0:19:23 

0:8:07 

0:12:45 

'  Control  vertices  data  for  teapot  is  taken  from  IEEE  CG&A  magazine,  Vol.  7, 
No.  1,  January  1987. 

Control  vertices  data  for  doughnut  and  sphere  is  taken  from  IEEE  CG&A 
magazine,  Vol.  7,  No.  4,  April  1987. 
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CHAPTER  5 

THE  PARALLEL  IMPLEMENTATION  OF  THE  ALGORITHM 

Ray  casting  algorithm  has  the  advantage  of  being  very  simple,  highly  parallel 
in  nature,  and  calculation  intensive  enough  to  produce  good  speedup  even  when  each 
processor  is  assigned  to  a  different  pixel.  To  accelerate  ray  tracing  some  of  the 
operations,  especially  intersection  computation,  can  be  performed  in  parallel  since 
the  computation  of  each  pixel  is  independent  of  the  others.  This  fact  allows  the  ray 
intersection  of  each  primitive,  a  bicubic  Bezier  surface  in  our  case,  to  be  computed 
in  parallel.  Each  surface  is  assigned  to  a  primitive  engine  (processor)  which 
determines  the  intersection  depths.  Then,  these  intersection  results  are  passed  to 
another  processor,  depth  comparison  logic,  whose  task  is  to  determine  the  visible 
surface  by  sorting  the  intersections  in  depth.  If  the  ray  doesn't  intersect  the  surface, 
that  is  the  intersection  is  behind  the  observer  and  should  not  be  displayed,  the  roots 
of  the  polynomial  are  complex  and  no  results  are  passed  to  the  depth  comparison 
logic.  If  the  ray  is  tangent  to  the  surface,  one  real  value  is  passed  to  depth 
comparison  logic.  If  the  ray  intersects  the  surface  in  two  or  three  points,  the  real 
values  are  passed  to  the  depth  comparison  logic. 

The  algorithm  presented  in  the  last  chapter  for  the  primitive  engine  was 
developed  to  efficiently  use  the  low  cost  parallel  processing  possible  in  VLSI.  In  this 
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chapter,  bicubic-surface  primitive  engine  (BiSPE)  based  on  bicubic  surface  concepts 
is  presented.  The  whole  architecture  called  GraPE  configuring  the  system  in  a 
pipeUne  fashion  with  the  primitive  processing  units  (PEs)  implemented  in  parallel  is 
explored  by  Iskandar  [ISKA90].  The  system  presented  in  his  dissertation  can  be 
summarized  as  follows."  A  database  manager  at  the  top  of  the  hierarchy  distributes 
several  objects  to  several  pipelined  viewing-transformation  engine  (VTE)  and  object 
processor  (OP)  combinations  whose  outputs  are  then  distributed  to  a  collection  of 
primitive  engines  (PEs).  The  PEs  broadcast  the  results  of  their  computations  such 
as  color,  intensity  and  depth,  to  the  compositing  network  designed  by  Fleischman 
[FLEI88].  This  pipelined-parallel  GraPE  architecture  is  shown  in  Figure  5.1.  It  is 
the  responsibility  of  the  compositing  network  to  combine  these  pixel  attributes  into 
the  final  image.  Iskandar  claims  that  a  workable  system  can  be  attained  with  as  little 
as  one  object  processor  and  one  set  of  primitive  engines  such  that  the  set  must 
contain  one  engine  for  each  kind  of  primitive  supported  by  the  system.  Therefore, 
BiSPE  design  appropriate  for  VLSI  implementation  can  be  incorporated  into  GraPE 
primitive  engine  as  part  of  the  bicubic  surface  generation  unit.  The  architecture  then 
can  be  connected  to  the  compositing  network.  The  details  of  the  connection  still 
need  to  be  worked  out. 

In  implementing  the  algorithm  in  parallel,  first  step  is  to  calculate  the 
coefficients  of  parameters  u,  v  in  equation  2.3  and  find  out  how  many  additions, 
shifts,  multiplications,  etc.  we  have.  Once  the  coefficients  are  known,  the  second  step 
is  the  solution  of  the  non-linear  equations.  This  might  require  more  computational 
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Figure  5.1  Pipelined-Parallel  GraPE  Architecture 
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power  from  each  primitive  engine  than  is  feasible  to  place  on  a  single  chip.  That's 
why  we  need  some  simplifications  that  can  be  applied  to  the  computation  of 
coefficients  and  solution  of  the  equations.  The  usage  of  coherence  and  forward 
differencing  results  in  simple  enough  hardware  such  that  one  or  more  primitive 
processors  could  be  placed  on  a  chip.  This  chapter  examines  these  techniques  to 
reduce  the  computation  given  in  chapter  4. 

5.1  Coherence 

Although  an  algorithm  which  treats  all  rays  in  the  same  way  has  the  advantage 
of  being  uniform,  its  efficiency  is  limited  without  taking  advantage  of  coherence.  It 
is  worth  investigating  whether  coherence  can  be  kept  in  the  parallel  implementation, 
and  if  so,  to  what  degree.  One  must  realize  that  as  the  number  of  processors  in  a 
system  increases,  the  granularity  of  coherence  is  reduced.  The  primary  coherence 
properties  are  scanline  to  scanline,  pixel  to  pixel  and  area  coherence.  In  this 
dissertation,  the  first  two  are  used.  In  the  former,  adjacent  scanlines  intersect  nearly 
the  same  set  of  edges  which  are  then  updated  incrementally  from  scanline  to 
scanline.  In  the  latter,  for  interpolation  across  a  scanline,  a  pixel  would  have  a  small 
change  in  its  value  from  its  neighbors  if  it  is  from  the  same  span. 

The  coherence  along  a  scanline,  that  is  pixel  to  pixel  coherence,  is  in  the  fact 
that  only  one  parameter  in  the  ray  definition  changes  as  the  ray  is  cast  through  the 
sequence  of  pixels.  The  algorithm  takes  advantage  of  the  coherent  structure  of  a 
scene  to  reduce  the  computations.  The  idea  simply  is  that  adjacent  pixels  on  a 
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scanline  are  likely  to  have  the  same  characteristics.  In  our  case,  the  set  of  visible 
object  spans  determined  for  one  scanline  of  an  image  typically  differs  little  from 
those  of  the  next  scanline.  Similarly,  other  properties  of  each  scanline  are  quite  like 
those  of  the  preceding  scanline.  Calculations  performed  are  dramatically  reduced 
in  the  algorithm  using  these  characteristics.  Since  the  root-finding  algorithm  requires 
an  initial  value,  coherence  is  exploited  by  beginning  v^th  the  previous  scanline's 
solution  for  the  current  scanline. 

During  image  generation,  the  sample  points  on  the  image  space  are  processed 
in  scanline  order.  At  the  beginning  of  each  scanline  processing,  every  bounding  box 
tree  is  traversed.  gi(u,v)  (see  equation  4.11)  has  to  be  calculated  only  once  for  a 
sample  point.  Similarly,  g2iu,v)  only  has  to  be  computed  once  for  a  whole  line  of 
sample  points.  In  processing  a  new  sample  point,  instead  of  calculating  g,(M,v)  from 
equation  4.11  again,  it  is  updated  by  the  increment  calculated  with  the  forward 
differencing  method.  For  each  scanline,  g2(u,v)  is  also  obtained  incrementally.  This 
way,  a  good  deal  of  computation  is  saved. 

5.2  Forward  Differencing  Technique 
An  efficient  way  of  repeatedly  evaluating  the  bicubic  is  by  using  forward 
differences  [FOLE82].  Forward  differencing,  an  incremental  technique,  is  used  to 
reduce  the  hardware  required  to  implement  the  primitive  processor.  This  technique 
for  rendering  curves  and  surfaces  with  adaptive  subdivision  is  explored  in  [LIEN87] 
and  [SHAN88].  In  these  papers,  shading  and  image  mapping  on  a  bicubic  surface 


71 

is  performed  by  drawing  many  curves  very  close  to  each  other  so  that  no  pixel  gaps 
remain  between  them.  Livadas  et  al.  [LIVA88]  used  this  incremental  technique  for 
the  computation  of  quadratic  surfaces.  In  this  section,  the  scanline  parameters  at  the 
beginning  of  each  scanline  are  updated,  taking  advantage  of  coherence,  using  forward 
differencing  technique.'  The  scanline  parameters  independent  of  the  vertical 
parameter  are  constant  from  scanline  to  scanline  and  thus  require  no  computation 
between  scanlines.  Computation  of  the  forward  differencing  steps  follows. 
The  forward  difference  of  a  function  /(/)  is 

A/(0  =/(r+6)  -/(O,     6>0  (5.1) 

We  can  rewrite  this  as 

fit^b)  =  m  ^  Am  (5.2) 

So,  if  we  know/(0  and  A/(/)  we  can  determine 6).  Rewriting  the  above  equation 
in  iterative  terms,  we  see  that  this  is 

=  /„  -  A/„  (53) 
where  we  evaluate /in  constant  steps  of  size  6,  so  n  and  t  are  related  by  t„  =  Sxn, 
and/,  =/(0. 

For  a  bicubic  surface  Qiu,v)  =  UCV^,  the  forward  differences  are 
A^x{u,v)  =  x(u,v+6)  -  xiu,v) 

aIxM  =  \x(u,v+5)  -  A^xiu,v)  (5.4) 
a';c(«,v)  =  aJx(«,v+6)  -  aJx(«,v) 


72 

The  equations  above  are  forward  differences  on  v.  Forward  differences  on  u  are 
calculated  similarly.  A,x(u,v)  is  a  second-degree  polynomial  in  v  when  u  is  kept 
constant.  x(m,v)  is  linear  in  v  whereas  the  third  forward  difference  is  a  constant, 
so  further  forward  differences  are  not  needed.  • 

In  the  algorithm',  the  initial  condition  calculations,  that  is  u  =  0,  v=0,  are  done 
by  direct  evaluation  of  the  equations.  If  we  put  these  equations  into  a  matrix  D  as 
shown  below 


D,  = 


x(0,0)  A^x(0,0)  A;x(0,0)  A>(0,0) 

A„x(0,0)  A„(A^x(0,0))  A„(A;x(0,0))  A„(A^x(0,0) 

A^x(0,0)  A^(A^x(0,0))  A^(A;x(0,0))  AliAlx(0,0) 

A^x(0,0)  A^(A^a:(0,0))  aJ(A;x(0,0))  a'^CaJxCO.O) 


(5.5) 


then 


D  = 


10  0  0 

0   6  6^  6^ 

0  0 

0  0  0  66^ 


^33    ^32   ^31  ^30 


^23    ^^22   ^21  ^20 


^13    ^12   ^11  ^10 


^03   ^02   ^01  ^00 


10  0  0 

0   6  0  0 

0  6^  26^  0 

0  6^  66^  66^ 


(5.6) 
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Rewriting  the  equation  5.6, 


D  =  £(6)  C  £(5) 


(5.7) 


where  e  and  S  are  the  parametric  inaements  in  the  u  and  v  directions,  respectively 
and  C  is  the  4x4  coefficient  matrix.  Because  we  are  dealing  with  three  functions, 
xiu,v),  yiu,v),  and  z(m,v),  there  are  three  corresponding  sets  of  initial  conditions, 
=  £(€)  QEiS),  D,  =  E(e)  C,E(S),  and  A  =  E(e)  CM^)-  We  will  concentrate  on 
the  X  component  since  the  calculation  of  others  are  exactly  the  same. 

Computation  of  x(0,v)  and  x(u,0)  in  increments  of  S  and  c,  respectively  is 
done  by  applying  a  forward-step  operation  F  to  the  matrix  Then,  we  obtain  the 
matrix  below 


DD.  = 


^00  ^'^lO  ^01^^11  '^02+<^12  ^03*^^13 
^10+^  ^11+<^21  ^12+^  ^13 +'^23 
'^20-' ^30   ^21^^31    ^22-^^32   ^23 -"^33 


*30 


*31 


*32 


*33 


(5.8) 


where 


F  = 


110  0 
0  110 
0  0  11 
0  0  0  1 


(5.9) 
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After  a  forward-step  operation  is  applied,  the  first  row  of  has  the  terms  xie,0), 
A;c(€,0),  Ajx(€,0),  and  A^(e,0).  These  quantities  are  now  used  to  compute  x(€,v), 
using  forward  differences.  Applying  F  repeatedly,  the  first  row  of  is  used  to 
compute  x(2e,v),  and  so  on.  Substituting  column  for  row,  that  is  transposing  D„  and 
applying  F  to  D„  we  calculate  x(u,0),  x(u,5),  x(u^5),  and  so  on.  The  procedure 
calculating  forward  differences  is  shown  in  Figure  5.2. 

Using  forward  differencing,  a  polynomial  of  degree  n  can  be  evaluated  at  a 
sequence  of  equally  spaced  points  by  only  n  additions  [CONT65].  This  is 
considerably  better  than  evaluating  the  polynomial  all  over  again.  However,  error 
accumulation  can  be  an  issue  with  this  approach,  and  sufficient  fractional  precision 
must  be  carried  to  avoid  it  [BART87]. 

The  forward  difference  basis  allows  parallel  additions  which  are  suitable  for 
a  pipeline  implementation.  We  compute  the  intersection  at  one  pixel  and  apply 
forward  differencing  to  get  the  next  pixel.  The  e  step  size  for  u  and  S  step  size  for 
V  are  adjusted  so  that  the  curve  steps  along  in  approximately  one  pixel  steps  in  screen 
coordinates.  Forward  differencing  method  has  the  advantage  of  producing  picture 
quality  equivalent  to  adaptive  subdivision  without  the  memory  stack  management 
overhead  of  recursive  subdivision.  The  relatively  poor  performance  of  adaptive 
subdivision  is  due  to  the  fact  that  a  subdivision  operation  takes  significantly  more 
computation  than  a  forward  difference  operation.  Forward  differencing  also  makes 
patch  rendering  performance  competitive  with  polygon  rendering.  The  technique 
operates  in  u,  v  space  but  polygon  methods  operate  in  the  screen  scanline  order, 
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cubic_visible_surface  algorithm 
{ 


for  i  =  1  to  num_patches  { 
read  patches  / 

} 


*  P  P  P  */ 


for  i  =  1  to  num_patches  { 
calculate  coefficient  matrices  /*  C„  Cy,  Q  */ 

} 


for  i  =  1  to  num_patches  { 
D,  =  E(€)  X  C,  X  E(5f ; 

=  E(e)  X      X  E(6f ; 
D,  =  E(e)  X  Q  X  E(5f ; 


/*  get  initial  values  */ 


DU,  =  F  X  D,; 
DUy  =  F  X  Dyi 
DU,  =  F  X  D,; 


D,  =  D7;     Dy  =  D 

DV,  =  F  X  D,; 
DVy  =  F  X  Dy; 
DV  =  F  X  D  • 


/*  calculate  x(u,v),  y(u,v),  z(u,v)  */ 
/*  in  increments  of  e  */ 


D,  =  D, 


/•  calculate  x(u,v),  y(u,v),  z(u,v)  */ 
/*  in  increments  of  5  */ 


for  i  =  1  to  num_scanlines  { 
update  scanline  parameters 

for  j  =  1  to  num_pixels  { 
calculate  intersection 
determine  lighting 
color  pixel 

increment  scanline  parameters 

} 

increment  inter-scanline  parameters 

} 


Figure  5.2  Procedure  for  Calculating  Forward  Differences 
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therefore,  once  the  initial  parameters  are  found,  forward  differencing  does  not 
require  a  transformation  from  screen  space  to  image  coordinates  as  the  polygon 
method  does. 

5.3  Pipelined  and  Parallel  Architecmre 

For  applications  that  require  fast  transformation  such  as  ray  casting  and 
rendering  of  complex  shaded  images,  a  parallel,  pipelined  multiprocessor  architecture 
can  be  used  to  achieve  high  throughput.  A  parallel  architecture  offers  an  increase 
in  speed  that  grows  almost  linearly  with  the  number  of  processors  used.  To  extract 
the  maximum  efficiency  from  a  parallel  architecture,  key  bottlenecks  must  be 
identified  and  an  architecture  that  is  best  suited  to  overcoming  those  bottlenecks 
must  be  chosen.  In  a  pipelined  architecture,  each  processor  performs  a  unique 
operation  on  the  entire  data  set,  or  object.  Once,  it  has  completed  its  operation, 
such  as  rotating  an  object,  it  passes  the  data  to  the  next  processor  in  the  pipeline, 
which  performs  another  function,  such  as  clipping.  While  that  processor  is  clipping 
the  current  object,  the  previous  processor  can  be  performing  geometric 
transformations  on  the  next  object. 

The  algorithm  proposed  is  designed  utilizing  both  pipelining  and  parallelism 
to  achieve  real-time  performance.  It  is  developed  such  that  each  intersection 
processor  can  be  placed  in  a  VLSI  chip.  In  a  practical  display  system  several  of  these 
chips  may  be  used  to  recursively  calculate  surface-ray  intersections  at  pixels  or  groups 
of  pixels.    The  bicubic  surface  primitive  engine  (BiSPE)  has  the  capability  to 
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compute  the  intersection  of  a  ray  and  bicubic  surface  with  sufficient  throughput  to 
allow  real-time  interactive  display.  Pipelining  helps  the  system  achieve  high 
throughput.  The  next  computation  can  be  started  before  the  previous  result  is 
available,  but  by  the  time  the  last  circuit  is  reached,  the  result  must  be  there. 

The  bicubic  surface  computation  unit,  BiSPE,  is  responsible  for  computing  the 
depth  and  color  intensity  values  of  pixels.  A  hardware  implementation  of  the  depth 
computation  unit  consists  of  muhipliers,  adders,  and  shifters.  BiSPE  is  based  on  the 
algorithm  given  in  chapter  4  for  rendering  bicubic  surfaces.  The  underlying  principle 
behind  the  algorithm  is  the  representation  of  primitives  as  curved  surfaces,  not 
polygons.  The  block  diagram  of  BiSPE  is  shown  in  Figure  5.3.  A  parallel 
computation  in  x,  y,  and  z  was  chosen  because  it  operates  three  times  as  fast  as 
computing  these  values  one  after  another  and  there  appears  to  be  space  enough  on 
the  chip  to  allow  it.  Based  on  bounding  box  tests,  each  PE  computes  the  pixel 
representation  of  each  primitive  sent  to  it. 

The  BiSPE  engine  produces  two  values,  the  color  of  the  surface  at  the  display 
point  and  the  depth  of  that  point.  A  combining  circuit  then  determines  the  visible 
scene  dynamically  while  several  BiSPE  engines  produce  the  next  pixel's  color/depth 
information. 

5.4  Results  and  Discussion 
The  difficulty  in  achieving  a  real-time  performance  in  a  graphics  system  is 
largely  due  to  the  insufficient  bandwidth  to  the  image  memory.  Consider  a  1024  by 
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1024  display;  for  an  image  update  rate  of  30  frames  per  second,  each  pixel  on  the 
raster  display  must  be  updated  every  31  nanoseconds  or  less. 

Current  VLSI  fabrication  technology  makes  it  possible  to  fabricate  a  32-bit 
CPU  on  a  single  chip.  However,  to  achieve  high  performance  from  that  processor, 
the  architecture  and  ithplementation  must  be  carefully  designed  and  tuned.  The 
MIPS  processor  incorporates  some  new  architectural  ideas  into  a  single-chip,  NMOS 
implementation.  Processor  performance  is  obtained  by  the  careful  integration  of  the 
software  (e.g.,  compilers),  the  architecture,  and  the  hardware  implementation.  This 
integrated  view  also  simplifies  the  design,  making  it  practical  to  implement  the 
processor  using  the  NSF  supported  fabrication  facility  (MOSIS). 

One  method  of  improving  the  performance  of  ray  casting,  to  which  the 
method  is  particularly  well  suited,  is  to  introduce  parallelism.  Since  each  ray  is 
independent  of  its  neighbors,  it  can  perform  its  computation  in  parallel  with  others 
without  interference.  An  alternative  use  of  parallelism  would  be  for  each  object  to 
have  its  own  processor,  and  to  supply  on  demand  the  z  values  of  the  intersections  of 
a  given  ray  with  the  object.  This  is  analogous  to  the  "processor  per  polygon" 
approach  to  parallelism  in  conventional  rendering  but  at  a  higher  level  which  in 
principle  allows  more  advantage  to  be  taken  of  any  special  properties  of  the  object. 
An  advantage  of  ray  casting,  and  a  possible  justification  for  its  use,  is  that  it  avoids 
the  need  to  make  polygonal  approximations  of  curved  surfaces.  Given  an  analytic 
specification  of  the  surface,  all  that  is  necessary  is  that  it  should  be  possible  to  solve 
the  ray-surface  intersection  test. 
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5.4.1  Complexity  and  Performance  Analysis 

The  depth  computation  unit  (DCU)  or  bicubic  surface  primitive  engine 
(BiSPE)  analysis  examines  two  areas:  complexity  and  performance.  Complexity  is 
estimated  for  discrete  construction,  for  VLSI  fabrication,  and  for  gate-array 
construction.  Performance  is  examined  with  respect  to  image  resolution  and  DCU 
processing  speed. 

DCU's  complexity  is  measured  utilizing  gate  count  estimate  which  is 
determined  by  partitioning  the  DCU  conceptual  hardware  organization  into 
individual  functional  logic  blocks.  Then,  the  estimated  gate  count  of  each  functional 
block  is  determined  and  totaled  to  provide  a  gate  count  estimate  of  a  DCU.  This 
technique  provides  an  estimate  of  expected  complexity  for  integrated  circuit 
fabrication.  It  also  provides  an  estimate  of  board-level  complexity  for  an  off-the-shelf 
integrated  circuit  implementation,  which  is  determined  through  totaling  the  functional 
logic  package  types  used.  It  should  be  noted  that  performance  is  usually  enhanced 
for  a  realization  by  judicious  use  of  additional  gating,  which  may  alter  the  estimated 
gate  count. 

When  the  architecture  is  built  in  a  parallel  and  pipelined  fashion,  matrix 
multiplication  takes  considerably  less  computation  than  given  in  section  4.5.1.  If  we 
have  four  processors  running  in  parallel,  multiplication  of  the  first  row  of  M  (Bezier 
matrix)  with  the  first  column  of  P  (matrix  of  control  points)  requires  four  subtractions 
only.  This  computation  where  MP  is  MxP  is  given  below. 
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AfP[0][0]  =       -  3x20  +  3*10  - 

=  (x^-Xq^  -  2(^20 -JCio)  -  (x^-x^^ 

Calculation  of  first  row  of  M  with  the  other  columns  of  P,  namely  MP[0][1], 
MP[0][2],  and  MP[0][3],  is  exactly  the  same.  Multiplication  by  2  is  just  shifting  1 
bit  left  and  doesn't  require  extra  hardware  since  it  is  hardwired.  On  the  average, 
matrix  multiplication  when  it  is  done  in  parallel  takes  one  clock  cycle.  Pipeline 
(propagation)  delay  is  three  add  delays  and  after  every  delay  one  result  is  obtained. 
Computation  of  the  second  row  of  M  with  P  requires  three  adds  and  two  subtracts: 

MP[1][0]  =  3(x«,- 2X10+^20)  =  3[(^oo-^io)  ^  (^20-^10)] 

=  2[(x^-x,^  +  (J^ao-^io)!  ^  [(^00-^10)  ^  (^20-^10)1 

For  the  third  row,  it  takes  one  add  and  one  subtract: 

MP[2][0]  =  -3x^  ^  3x,, 

The  units  to  calculate  elements  of  each  row  are  shown  in  Figure  5.4.  The  results 
above  show  that  multiplying  three  4x4  matrices  costs  22  additions/subtractions  for 
one  component.  Computation  of  this  matrix  with  U  and  then  multiplication  of  the 
resultant  with  K requires  3  multiplies  and  3  adds  each.  So,  forx(u,v)  6  multiplies  and 
28  additions/subtractions  are  needed.  In  total,  calculation  ofx(u,v),_y(M,v),  and  2(u,v) 
requires  84  adds/subs  and  22  multiplies.  Rest  of  the  computation  is  the  same  as  the 
rest  of  the  computation  given  in  section  4.5.1.  The  number  of  floating  point 
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Figure  5.4.  Block  Diagram  of  the  Matrix  Computation  Unit 
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operations  when  the  intersection  calculation  is  implemented  in  parallel  is  shown  in 
Table  5.1. 

The  total  number  of  floating  point  operations  to  calculate  the  intersection  of 
a  ray  with  a  patch  at  a  pixel  with  one  iteration  is  177.  Clearly,  there  is  a  substantial 
saving  compared  to  448  floating  point  operations  when  the  algorithm  is  implemented 
sequentially.  The  gate  count  of  32-bit  full  adder  is  518  [KUCK78].  The  gate  count 
for  multiplier  is  15,284.  With  the  iterative  method,  division  requires  6  multiplications 
and  3  subtractions.  So,  for  177  operations  the  total  gate  count  is  approximately 
1,000,000.  In  fact,  gate  counts  of  5000  are  often  used  in  multipliers,  therefore  this 
circuit  can  be  realized  with  a  lot  less  than  1,000,000  gates. 

Because  of  the  finite  number  of  bits  used  in  representing  number,  every 
mathematical  computation  has  the  potential  of  introducing  a  round-off  error  in  its 
result.  To  prevent  this  error,  guard  bits  are  used.  This  means  operands  of  any 


Table  5.1  Number  of  Floating  Point  Operations  for  Ray/Patch  Intersection 


Term  Multiplication  Addition  Subtraction  Division 

gi,g2  20  42  44 

u,,i,  v,,i  33  27  9  1 

t  1 
Total  53  70  53  1 
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mathematical  function  must  be  represented  by  more  bits  than  the  result.  The 
number  and  type  of  functions  performed  dictate  the  extra  precision  (guard  bits) 
required  by  the  operands.  If  the  result  of  every  operation  is  rounded,  the  error  is 
plus  or  minus  one  half  of  the  least  significant  bit.  Thus,  for  a  number  represented 
by  n  bits,  the  worst  case"  error  is  Z^^"*^'.  For  multiplication  and  division,  the  least 
significant  bit  (LSB)  of  the  result  may  be  wrong.  Ergo,  for  these  operations, 
operands  with  higher  precision  than  the  result  must  be  used  to  counter  the  error. 
For  addition/subtraction,  the  largest  errors  occur  in  producing  differences  in  nearly 
equal  numbers.  These  operations  were  eliminated  when  possible. 

Depth  computation  is  handled  by  the  depth  computation  unit  (DCU).  Figure 
5.5  shows  the  components  of  DCU.  This  unit  solves  for  the  roots  of  equation  4.11. 
The  number  of  bits  required  for  each  DCU's  component  depends  on  the  required 
precision  at  the  output.  We  use  24  bits  for  the  depth  information,  since  the  depth 
resolution  of  24-bits  is  satisfactory  for  high-end  graphics  devices.  Since  only  positive 
values  are  used  and  negative  intersection  depths  are  ignored,  the  display  space  is 
limited  to  23  bits  of  precision  in  the  Z  dimension.  This  limit  dictates  the  precision 
on  the  X  and  Y  dimensions  to  be  23  also.  The  result  at  the  output,  namely  the 
parameters  u,  v,  and  /,  requires  23-bit  precision  as  discussed  above.  When  the 
precision  of  the  DCU's  components  is  traced  back  from  the  output,  we  get  32-bit 
precision  at  the  input. 

For  real-time  animation,  a  minimum  image  update  rate  is  10  frames  per 
second.  A  30-frames-per-second  image  update  rate  is  more  typical  for  high  quality 
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animation.  This  kind  of  image  update  rate  leaves  33  milliseconds  computation  time 
for  each  image.  This  is  enough  time  to  render  60  primitives  (patches).  We  can  see 
that  quite  complex  images  can  be  built  from  this  number  of  primitives.  For  the 
minimum  image  update  rate,  three  times  more  primitives  can  be  rendered.  Many 
more  primitives  can  be  rendered  if  the  clock  frequency  can  be  increased.  A 
100  MHz  clock  will  double  the  primitives  that  can  be  handled.  The  above 
computation  estimates  the  performance  of  one  primitive  engine.  Several  of  these 
primitive  engines  can  be  running  in  parallel.  The  potential  performance  of  the  whole 
system  then  will  be  the  performance  of  one  primitive  engine  multiplied  by  the 
number  of  primitive  engines  in  the  system. 

We  just  estimated  scene  complexity  by  the  number  of  primitives  that  can  be 
rendered  within  one-frame  time.  Another  measure  of  scene  complexity  is  based  on 
the  screen  resolution  that  can  be  managed  by  the  system.  This  can  be  found  by 
determining  the  pixel  time.  The  pixel  time  is  the  time  available  to  process  one  pixel. 
The  pixel  time  depends  on  both  screen  resolution  and  image  update  rate  as  depicted 
by  the  following  equation 

1 

processing  time  =   

(Image  update  rate)  *  (Screen  resolution) 

From  the  above  equation,  processing  times  for  different  image  resolutions  are  shown 
in  Table  5.2. 
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Table  52  Processing  Time  for  Various  Screen  Resolutions 


Screen  Resolution 


Processing  Time  (ns) 


640  X  480 
1280  X  960 
1280  X  1024 
1280  X  1280 
1400  X  1280 
2048  X  2048 


325.5 
81.4 
76.3 
61.0 
55.8 
23.8 


Note  :  Image  Update  Rate  =  10  frames  per  second 


Since  the  system  is  implemented  in  a  pipeline  fashion,  its  pixel  time  is 
determined  by  the  slowest  components  in  the  pixel  computation  unit.  These 
components  are  the  floating  point  add  or  subtract  units.  Using  the  i486  as  reference, 
a  floating  point  add  or  subtract  takes  5  clock  cycles. 

The  algorithm  requires  only  177  flops  per  iteration  step  compared  to  419  flops 
for  the  method  presented  in  Sweeney  and  Bartels  [SWEE86].  Our  algorithm  takes 
only  one  iteration  to  converge  whereas  theirs  take  two  or  more.  The  normal 
computation  takes  216  flops  when  theirs  require  407  flops.  Kajiya's  method  [KAJI82] 
takes  6135  flops  in  the  worst  case.  The  interval  Newton  method  [TOTH85]  involves 
similar  figures. 
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5.4.2  Simulation  Results 

Clearly,  the  intersection  of  a  ray  and  a  bicubic  Bezier  surface  is  not  a  simple 
task.  In  spite  of  the  complexity  of  the  mathematics  involved,  the  actual 
implementation  is  simple  enough  to  allow  convenient  hardware  implementation  of 
the  algorithm.  The  architecture  developed  to  implement  the  algorithm  for  the 
bicubic  surface  primitive  engine,  BiSPE,  is  shown  in  Figure  5.3. 

The  algorithm  correctly  handles  the  difficult  cases  of  the  ray  tangentially 
intersecting  a  planar  patch  and  intersections  of  the  ray  at  a  silhouette  edge  of  the 
patch.  It  can  be  broken  down  into  modules.  The  granularity  (the  size  of  tasks  which 
are  assigned  for  work)  of  the  modules  depends  on  what  kind  of  hardware  or 
architecture  they  are  implemented  on.  One  possibility  of  a  coarse  architecture  is  to 
implement  the  modules  as  firmware  for  a  general  purpose  microprocessor  with 
floating  point  support  such  as  the  Intel  i860  or  i486  (80486).  Of  course  this  type  of 
implementation  wastes  a  significant  amount  of  the  computing  power  and  real-estate 
because  of  the  general  nature  of  the  processor.  The  ideal  implementation  would  be 
a  medium-grain  architecture  with  only  the  time-consuming  tasks  implemented  in 
silicon.  This  is  a  type  of  microcoded  processor.  This  kind  of  architecture  makes  it 
possible  to  implement  the  algorithms  for  different  primitives  on  one  type  of 
processor.  A  fine-grain  implementation  is  where  every  step  in  each  algorithm  is 
hard-wired.  This  of  course  will  yield  the  fastest  processor.  However,  due  to 
complexities  of  the  algorithms,  this  kind  of  architecture  needs  a  considerable  amount 
of  silicon. 
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Depth  computation  unit,  DCU,  can  be  implemented  in  VLSI  by  today's 
technology.  The  difficulty  may  arise  in  implementing  several  DCU  in  a  single  chip. 
The  goal  is  to  implement  the  complete  system  on  a  single  board,  and  several  DCUs 
may  be  required  to  be  in  one  chip.  With  the  advancement  of  VLSI  technology,  this 
will  be  possible  in  near  future. 


CHAPTER  6 
CONCLUSION 

6.1  Summary 

This  dissertation  presents  the  derivation  of  a  ray  casting  algorithm  for  bicubic 
Bezier  surfaces.  A  paralleHzed  version  was  proposed.  The  major  feature  that 
distinguishes  the  algorithm  lies  in  the  way  the  primitives  are  rendered  directly  from 
the  bicubic  form,  while  most  other  systems  are  based  on  polygonal  primitives. 

I  had  two  goals  when  I  started  this  research.  One  was  to  do  the  intersection 
computation  in  short  time  so  that  the  use  of  patches  in  an  interactive  environment 
will  be  attractive.  My  second  objective  was  to  design  the  algorithm  so  that  it  can 
easily  be  implemented  in  VLSI  chips.  The  two  goals  set  at  the  beginning  of  the 
research  have  been  accomplished:  a  computationally  regular  and  efficient  process  for 
the  display  of  bicubic  Bezier  surfaces  is  devised,  and  a  novel  pipeline  with  hardware 
enhancements  for  image  compositing  and  pixel  production  schemes  is  utilized. 

The  algorithm  developed  is  simple,  efficient  and  fast.  The  method  presented 
here  compares  well  with  the  methods  given  in  Sweeney  and  Bartels  [SWEE86],  Toth 
[TOTH85],  and  Kajiya  [KAJI82].  The  research  focused  on  a  numerical  technique 
called  Newton  iteration  to  calculate  the  intersection  points  of  a  ray  and  a  bicubic 
Bezier  surface.  There  are  several  reasons  for  selecting  the  Bezier  representation. 
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The  Bezier  control  mesh  provides  a  rough  approximation  to  the  surface,  and  thus 
makes  designing  test  cases  relatively  simple  and  more  intuitive  than  if  some  other 
representation  had  been  chosen. 

In  this  research,  a  processor  is  assigned  to  a  primitive  in  the  display.  So,  a 
primitive  processor  is  the  intersection  computation  hardware  for  a  single  surface. 
The  points  of  the  intersection  are  the  roots  of  the  equation  Q(u,v)  -  Dt  -  RO  =  0 
in  terms  of  the  parametric  variables  u,  v,  and  /.  After  obtaining  the  roots  of  the 
above  equation,  coherence  and  forward  difference  are  used  to  reduce  the  number  of 
operations  required.  Usage  of  these  methods  results  in  simple  enough  hardware  such 
that  one  or  more  primitive  processors  can  be  placed  on  a  chip.  It  appears  that  the 
algorithm  is  simple  enough  to  be  implemented  in  a  VLSI  chip  in  the  future.  The 
purpose  of  the  chip  is  to  calculate  the  intersection  point  of  a  ray  with  the  curved 
surface.  Parameter  values  (u,v)  specify  the  location  of  the  intersection  on  the  patch, 
and  parameter  value  t  gives  the  location  of  the  intersection  on  the  ray.  The  faster 
image  creation  and  a  more  natural  user  interface  resulting  from  the  way  objects  are 
defined  in  the  database  are  the  foreseeable  advantages  of  the  proposed  design. 

Ray  casting  is  a  very  popular  technique  because  of  the  realistic  images  that 
can  be  produced.  A  great  variety  of  visible  surface  algorithms  have  been  developed 
over  the  years,  but  not  the  research  reported  here.  Not  much  progress  has  been 
made  on  hardware  techniques  for  rendering  curved  surfaces. 
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These  steps  are  followed  in  this  research: 

1.  An  algorithm  is  developed  to  calculate  the  intersection  of  the  ray  with  the  cubic 
surface. 

2.  The  computation  required  is  reduced  using  coherence  and  forward  differencing 
methods.  By  taking  advantage  of  image  coherence  the  average  number  of  iterations 
per  intersection  test  is  reduced  to  one. 

3.  Ray  casting  has  been  a  particularly  popular  candidate  for  parallel  implementation 
because,  in  its  simplest  form,  each  pixel  is  computed  independently.  The  algorithm 
is  designed  such  that  it  can  be  placed  in  a  VLSI  chip  taking  advantage  of  the 
parallelism  of  the  ray  casting  algorithm.  This  also  reduces  the  algorithm's 
computational  requirements. 

4.  The  algorithm's  complexity  is  studied  to  determine  its  computational  order. 
Moreover,  the  operations  in  the  algorithm  are  also  counted. 

Advantages  of  the  algorithm  are  that  it  is  conceptually  relatively  simple,  and 
the  code  produced  therefore  is  concise,  easy  to  understand  and  easy  to  generalize. 
Newton's  method  is  by  nature  approximative  and  this  can  be  a  disadvantage  of  the 
algorithm  proposed  here  since  imprecise  intersection  solutions  can  occasionally  lead 
to  problems. 
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6.2  Future  Work 

Many  improvements  or  extensions  of  the  algorithm  are  possible.  The  objects 
in  the  world  scene  may  be  extended  to  include  other  objects  such  as  spheres,  fractals. 
Each  object  type  may  be  defined  separately  and  individual  routines  for  the 
intersection  computation  may  be  included.  As  long  as  a  routine  may  be  written  to 
determine  the  point  of  intersection  and  the  normal  to  the  object  at  that  point,  the 
object  may  be  included  as  part  of  the  scene,  taking  advantage  of  the  object  oriented 
property  of  the  algorithm.  The  concept  of  bounding  volumes  could  be  extended  to 
group  objects  in  the  environment  which  are  physically  close  to  each  other.  A 
hierarchical  environment  could  be  created  of  the  entire  scene,  each  level  surrounded 
by  a  bounding  volume.  If  the  ray  intersects  the  bounding  volume,  the  components 
of  that  cluster  need  to  be  examined;  otherwise  several  objects  may  be  eliminated 
from  the  intersection  process.    The  model  file  would  supply  the  information 
regarding  the  groupings,  creating  some  work  for  the  user,  however,  the  time  savings 
of  the  ray  casting  process  is  substantial  [WEGH84]. 

The  Bezier  representation  is  actually  a  special  case  of  B-splines.  Extension 
of  the  algorithm  to  bicubic  B-spline  surfaces  would  only  involve  converting  the 
control  mesh  to  the  internal  power  basis  representation.  Ray  casting  algorithm  can 
be  extended  to  be  a  ray  tracing  algorithm  when  rays  are  further  traced  from  the 
intersection  point.  Surface  shading,  shadowing,  and  anti-aliasing  can  also  be  added 
to  make  the  algorithm  more  complete. 
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The  hardware  referred  to  in  this  research  was  simulated  in  software.  This  is 
required  to  determine  its  detailed  complexity,  possible  pipelining  structure  and 
register  requirements.  Hardware  implementation  of  a  primitive  processor  for 
ray/surface  intersection  is  the  next  logical  step. 
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